The Failings of Collaborative Filtering or How too Many Cooks Spoil the Soup
Mar 26th, 2008 by Scott Oddo
Collaborative Filtering has become one of the most popular methods for recommending products or content to customers based on past history. It’s popular because it is easy to understand how it works and there are plenty of off the shelf versions of it you can buy or you can build your own pretty easily. Collaborative filtering clusters customers together based on a common purchase history. The idea is that people are similar and like similar things. If you match people together with other people that have similar interests then for an individual in that group you can find items that others in the group have purchased but the individual hasn’t. The individual is more likely to buy that item because of the shared interests of the group. There are two main problems with this approach. One is that in deciding which items to recommend, the algorithm usually picks the most popular items in the group that the individual has yet to purchase. Thus, popularity becomes a self-reinforcing factor in the recommendations. Popular items get recommended more and therefore become purchased more often which in turns causes them to become even more popular. The other is that while people can be very similar in certain tastes, individuals all have differences. By grouping people together and then looking for things they don’t have in common, the recommendation that you think is a deep nugget may really just be one of the things that makes this individual different from the rest of the people in the group. Instead of finding something they like, you’re actually finding what they don’t like. The quality of recommendations generated by collaborative filtering really depends on how alike the people in a cluster are. If they are exactly alike, then all the recommendations will be good but the more divergent their interests the worse the recommendations will become. It’s like the old adage about too many cooks spoiling the broth. The cooks could work together if they all agree on the ingredients and the amounts, but every cook is a little different and wants to add their own special ingredients. Occasionally this leads to the creation of some wonderful new taste experience but more often than not it leads to a mess that no one wants to eat.
A good example of this was recently described in an article on ReadWriteWeb. The article describes Aggregate Knowledge’s content discovery network (Pique) and how it works on the BusinessWeek website. As a user clicks on stories on the site the discovery network recommends other articles of interest in a menu on the right hand side of the page. The author complained that the stories being recommended seem to have nothing in common with the stories he had clicked on. He pointed out that the collaborative filtering was only taking into account the popularity of the items among people who had read the same stories as him but did not take into account the actual content of the stories. The failure of collaborative filtering to take content into account isn’t so obvious when it is applied to a service provider that only sells items that are similar or strongly related in some way. However, when the service provider is more generalized people’s interests are going to overlap more and be less significant. Think about how the average person reads BusinessWeek or The Wall Street Journal. I might read a story on Ben Bernanke lowering the prime rate and then read a story about how Ford sold Jaguar to India’s Tata. There is no strong connection between the stories and I don’t have a deep interest in monetary policy or the automotive industry. I just read the stories because I have a general interest in the overall health of the economy. This kind of user behavior is common at a site that has very generalized content and the more generalized the content is the more generalized the user behavior will be. You won’t be able to predict anything about the user in that situation unless you analyze the content of what they are reading and see how their affinity for that content evolves over time.
In one of the examples from the article the author reads a story about Apple considering giving people unlimited free access to the music library. However none of the recommended stories have anything to do with technology or music. The other stories are all just popular business stories that just happen to have been read by the same people who read the Apple article. Some of the people read the Apple article simply because it is another business story about a very well known and fairly important corporation. Others read the Apple story because they have a deep interest in technology. If the system had analyzed the content of this article and then compared it to the content of other articles the author had read over a period time then it would have picked up on the fact that the author is interested in business’s that deal with technology, particularly ones that have products related to computers or the Internet.
With examples like these, it begs the question “why do companies use collaborative filtering without doing any content analysis?” The answer is collaborative filtering is easy and content analysis is very hard. In particular content analysis requires gathering a lot of data about the content being served whereas collaborative filtering requires no data about the content at all. You only need to know consumption behavior which is something service providers already have. So they can use collaborative filtering and get a little boost to sales by adding a recommendation feature that works okay in some situations with very little start up cost.
At matchmine however, we are not interested in just making okay recommendations. We want to make great recommendations and in order to do that you have to realize that content is king. That is why we have amassed a huge catalog of content across several media types and the catalog continues to grow every day. We have a team of scientists dedicated to analyzing content using the most sophisticated techniques available. All of this is designed to recognize and understand the preferences of an individual so that we can make the best recommendations to that individual. No two people are exactly alike and neither is content. Real personalization is only achieved through a deep understanding of the content that an individual is interested in. Our recommendations aren’t for everybody – they’re just for you.


