CouponLooker Search Processing
Today we launched yet another update of CouponLooker, this time with a display of popular searches and also recommendations at the end of your search results that suggest additional terms to get better results.
I’d like to go into some interesting details of how the search processing actually works but I’m not going to share the specific details of how we do rankings. It is a pretty well known phenomena that any system you can create that has some benefit to people will get gamed as much as possible by someone out there. Since we provide links and traffic to successful coupon sites, there is some pretty strong motivation (hopefully) for them to want to be the top coupons we feature.
Having said that, sharing general hints about what I consider a “good” coupon and how the actual processing works shouldn’t hurt. Since the goal is to provide end-users the best possible results (good coupons) by reinforcing the creation of good coupon data we just help the system.
First of all, the actual process of interpreting the user’s query and scoring all the possible coupons is pretty much live. There is a small amount of caching of first page results, but we do only limited pre-processing of our data. The thought process here is that CPU resources are relatively cheap in today’s world and by keeping the search ranking engine in-memory and live it makes it much easier for us to iterate on the algorithm. At some point in the future when we are so successful that we have filled a full rack with machines and the algorithm is just being fine-tuned to give great results is fairly well understood, we can move to a more complicated higher performance pre-processing model. For now we are focusing all the sophistication on the search techniques themselves.
First step is to process the search input that the user typed in. Break the query into separate words, deal with punctuation and throw away words that are garbage. For example, searching for the word “coupon” on CouponLooker is not helpful- they are all coupons. So we drop it. “a”, “the”, and other similar words are also thrown-away.
The next step is to scan through all the coupons in our system. Unlike Google that needs to index the whole Internet, CouponLooker works with a relatively finite source of data so (thanks to the amazing advances in CPUs) its feasible to scan through the whole thing. Our search engine generates a score for how well each coupon matches the user’s query (more on the scoring later). Once it scores every coupon, it sorts of the list by score and those represent the results returned to the user.
But first there is one more phase- we scan through the coupons and for each coupon we look for other’s that are basically the same thing. For a given set of related coupons the highest scoring one is displayed to the user and the others are collapsed under it. This is also the stage at which we generate the list of recommended search terms- we analyze the text of all the result coupons and pick out terms that you didn’t search for that are common in their descriptions. This relies on a much longer set of stop-words since there are many many terms that appear frequently that are not going to actually mean anything to differentiate different coupons.


May 10th, 2007 at 1:44 pm
[...] I’ve been posting over on the Launch21 blog about the CouponLooker vertical-search engine that I’ve been doing for Judy’s Book. Interesting stuff- this is one of the more fun projects I’ve done in a while. I suppose I should be embarrassed to admit that I did about zero research into how search is supposed to be done. Still, the results seem pretty good and its fun to share some of the techniques that I’ve developed so far. [...]