Archive for the 'Search' Category

CouponLooker Merchants and Percent Off

Thursday, June 14th, 2007

Yet another CouponLooker update went live today. The most noticeable additions are recommended sub-queries for % off coupons and specific merchants. For the specific merchants it was necessary to add a mechanism to normalize the merchant names, since the coupon feeds we get sometimes list Dell as Dell.Com, Dell Home, Dell USA, or about 40 other variations.

The other bigger project was mostly not visible. We are experimenting with sponsored coupons, which would allow the provider of a coupon feed to pay us to get a higher listing for their coupon. This is tricky since our top priority is to maintain the quality of the results we give users- just because someone is paying doesn’t mean its ok to show some crap unrelated coupon. So the system is built around some fairly sophisticated scoring based on the initial score, click through rates (if a coupon is relevant, people will click it, if its less interesting it gets ignored) and more.

The system we use bears some resemblance to some of the similar stuff from Google. Its interesting observing the dynamic in discussing emulating Google’s approach to something. Google has been such an extreme success story that its a bit tempting to take an initial position that you can just emulate them as the path to sure riches.

Back in reality its important to break it down into two pieces. First of all, the amazing revenue that Google enjoys is not just a product of their technology, rather its supported by their technology. Emulating the technology doesn’t directly give you any revenue. You need a good business plan and model, and I’ll leave that to others (from my seat its in good hands at Judy’s Book).

On the technical side it doesn’t just work to copy either. Instead its important to really understand the “why” behind the mechanisms that work and understand how you can apply that same learning to your different problem. So for example, we use a system that supports bidding for ad placement. We measure click-through rates to adjust the value of the ads. But other items are very different- Google ads are separate from the content, whereas in CouponLooker, someone is sponsoring a specific coupon that is part of our existing content.

Adding the sponsored coupons also introduced some new technical issues to deal with. Until now the usage of the system needed little back and forth with the database. In effect the engine could load all the coupons, and then search them from there with little changes until new coupons are loaded. With the new system there are much more frequent changes that an advertiser can create by placing new ads, running out of their daily budget or other factors which can change the results over shorter time-frames. I feel like the in-memory approach to the search engine really paid off here since it gave me lots of flexibility to change the details of computing the results. If I had relied much more on pre-computed indicies and results it would have been much harder to make this stuff work.

Which isn’t to say that it won’t be possible to pre-computer stuff later, but just that if you do that before you really have the details of the system worked out, you will have set in stone mechanisms that are harder to adjust.

CouponLooker Search Processing

Thursday, May 10th, 2007

Today we launched yet another update of CouponLooker, this time with a display of popular searches and also recommendations at the end of your search results that suggest additional terms to get better results.

I’d like to go into some interesting details of how the search processing actually works but I’m not going to share the specific details of how we do rankings. It is a pretty well known phenomena that any system you can create that has some benefit to people will get gamed as much as possible by someone out there. Since we provide links and traffic to successful coupon sites, there is some pretty strong motivation (hopefully) for them to want to be the top coupons we feature.

Having said that, sharing general hints about what I consider a “good” coupon and how the actual processing works shouldn’t hurt. Since the goal is to provide end-users the best possible results (good coupons) by reinforcing the creation of good coupon data we just help the system.

First of all, the actual process of interpreting the user’s query and scoring all the possible coupons is pretty much live. There is a small amount of caching of first page results, but we do only limited pre-processing of our data. The thought process here is that CPU resources are relatively cheap in today’s world and by keeping the search ranking engine in-memory and live it makes it much easier for us to iterate on the algorithm. At some point in the future when we are so successful that we have filled a full rack with machines and the algorithm is just being fine-tuned to give great results is fairly well understood, we can move to a more complicated higher performance pre-processing model. For now we are focusing all the sophistication on the search techniques themselves.

First step is to process the search input that the user typed in. Break the query into separate words, deal with punctuation and throw away words that are garbage. For example, searching for the word “coupon” on CouponLooker is not helpful- they are all coupons. So we drop it. “a”, “the”, and other similar words are also thrown-away.

The next step is to scan through all the coupons in our system. Unlike Google that needs to index the whole Internet, CouponLooker works with a relatively finite source of data so (thanks to the amazing advances in CPUs) its feasible to scan through the whole thing. Our search engine generates a score for how well each coupon matches the user’s query (more on the scoring later). Once it scores every coupon, it sorts of the list by score and those represent the results returned to the user.

But first there is one more phase- we scan through the coupons and for each coupon we look for other’s that are basically the same thing. For a given set of related coupons the highest scoring one is displayed to the user and the others are collapsed under it. This is also the stage at which we generate the list of recommended search terms- we analyze the text of all the result coupons and pick out terms that you didn’t search for that are common in their descriptions. This relies on a much longer set of stop-words since there are many many terms that appear frequently that are not going to actually mean anything to differentiate different coupons.

CouponLooker Updates

Sunday, April 22nd, 2007

Rahul writes about our latest update to CouponLooker. This is our third and its one of the more exciting things to me how quick it is to update and tweak the search algorithm. There are some pretty sophisticated things going on inside the engine but things have been designed with an eye towards flexibility and rapid iteration. I’m sure the equivalent results could be done with an approach that uses half as many CPU cycles, but frankly CPU cycles are relatively cheap now and making rapid improvements are invaluable.

One detail that you won’t see using the service directly is that I’ve put in some A-B logic for our own development purposes. I can put the site into a special mode where whenever it does a search it does it twice using random different permutations of our algorithm. We can then experiment with this in the office or with test subjects and ask people to search for useful things and give the system feedback on which of the result-sets look more useful.

CouponLooker Vertical Search Behind the Scenes

Thursday, April 19th, 2007

One of the interesting projects I’ve worked on recently is building the CouponLooker vertical search engine for Judy’s Book along with some of their in-house developers and designers. Vertical search is a recently popular category as people have realized that Google, while it does a great job for general content topics, often returns poor results when in various specific categories. Those categories are often marked by either structured data or else a commerce related reason why people have spammed the general search results (or ideally both). My own CalendarData.com site tackles one such vertical search category (public events) and CouponLooker aims at another one that is covered by both of the good criteria.

The scenario for CouponLooker is pretty simple. You are buying your favorite stuff somewhere on the Internet and are checking out when you see a textbox that looks like this-

Apply Coupon Code

Promo code? Coupon code? I dunno, is there a coupon for this vendor? Am I the chump who is paying full price while others are saving? But go search on Google and you will likely get lots of results that try to steer your purchase to other stores, that provide old expired info or just generic results that aren’t very useful at all.

The search site we build has 4 main components-

1) Data acquisition- pulling in coupon data, processing it and storing it.

2) The search engine- picking which coupons are the best matches for a user’s search.

3) The web-site itself

4) The blog widgets that let you host CouponLooker search on your own blog.

I’ll go into more details about the designs of these components in subsequent posts (but I’m not saying I’m going to cover them in any specific order).