In our last blog post, we talked about personalized recommendations and mentioned Youtube and Amazon as specific examples. I wanted to add another example to this list: Netflix. Recommender systems is a technology Netflix helped promoting. They have always played a key role in Netflix' core business and Netflix' contributions to the domain attracted interest from the scientific community.
Netflix has made it clear that recommender systems do work well: about 75% of what people watch at Netflix is due to a recommendation. But what is even more fascinating is the evolution of the technology throughout the history of the company.
Netflix: The DVD-by-mail business
Netflix now is a global provider of streaming videos, available in over 190 countries and 21 different languages. However, before Netflix started offering streaming in the year 2007, Netflix was a DVD-by-mail service, and was available only in the US. Netflix would send you a DVD by mail, you'd watch the movie, send the DVD back to Netflix, and then start over.
If you wanted, you could rate movies that you watched (on a 1 to 5 scale), and Netflix would then recommend movies to you based on how you rated the movies you watched. These personalized recommendations were a central part of the Netflix experience, and therefore the quality of the recommendations had a large impact on Netflix' business.
Given the importance of high-quality recommendations, in 2006 Netflix launched a public competition, the "Netflix Prize". The goal was to improve their in-house recommender system (called Cinematch) by at least 10%. Almost anyone could compete, and the winning team would win one million US dollars.
As a competitor you had to develop a recommender system and train it on a dataset Netflix made publicly available (it is still available for download. The data was essentially made of star ratings that (anonymized) users gave to movies. The goal was to predict which rating a user would give to a movie he hasn't yet watched. You would submit your predictions on a large test set (on which the ratings provided by users have been withheld), and Netflix would compute and inform you of the the root-mean-square error (RMSE) on this test set.
Given the strong financial incentive, thousands of teams competed (including yours truly, under the name kouburgs), and the prize was ultimately won - barely, and only a few minutes before the three year long competition ended. The winning team ("BellKor's Pragmatic Chaos") was in fact a fusion of three previously independent teams ("BellKor", "BigChaos", and "Pragmatic Theory") which blended their algorithms. The prize would not have been won had the teams not formed a last-minute alliance!
Aside from being a great marketing stunt, Netflix' original goal in launching a public competition was of course to be able to make use of the algorithms that were presented by the winning team(s). The winning solution was a blend of the solution of three teams, and each team's solution itself consisted of many algorithms that were blended together. The three teams that formed the winning alliance publicly documented their approaches 1 2 3. These documents are lengthy reads, due to the high complexity of the solutions.
It turned out that it was too difficult for Netflix to replicate the solution that was used to win the Netflix prize. However, some algorithms that were used to win the prize, such as singular value decomposition (SVD) and restricted Boltzmann machines (RBMs), were incorporated into Netflix' production environment.
Big changes at Netflix
Shortly after the completion of the Netflix prize, Netflix started focussing more and more on video-on-demand (VoD). This change also affected the demands put on Netflix' recommender system. Xavier Amatriain and Justin Basilico from the Personalization Science and Engineering department of Netflix say it best:
One of the reasons our focus in the recommendation algorithms has changed is because Netflix as a whole has changed dramatically in the last few years. Netflix launched an instant streaming service in 2007, one year after the Netflix Prize began. Streaming has not only changed the way our members interact with the service, but also the type of data available to use in our algorithms. For DVDs our goal is to help people fill their queue with titles to receive in the mail over the coming days and weeks; selection is distant in time from viewing, people select carefully because exchanging a DVD for another takes more than a day, and we get no feedback during viewing. For streaming members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and we can observe viewing statistics such as whether a video was watched fully or only partially.
In other words: Netflix users do not want to go through the effort of providing star-based ratings when streaming video from Netflix. Therefore, Netflix had to adapt its recommender system to make use of other kinds of indications of viewer preferences, such as whether a video was watched fully or only partially.
Star-based feedback is often referred to as explicit feedback, whereas viewing data is referred to as implicit feedback. Netflix is not the only company to rely on implicit feedback to generate video recommendations 4. Implicit feedback has the advantage of being easier to collect, since the viewer can remain completely passive.
However, implicit feedback also has a number of disadvantages compared to explicit feedback:
- No negative feedback is possible. With star-based ratings, a rating of one star indicates that the viewer did not like the video. With implicit feedback, there is no equivalent signal that demonstrates that the viewer strongly disliked the video.
- Implicit feedback is more noisy. Providing explicit feedback (star-based ratings) requires mental effort, which is why it is likely to be less noisy than implicit feedback data.
The change at Netflix from DVD-by-mail to VoD and from explicit to implicit feedback also required the use of new and different algorithms from the one employed during the Netflix prize.
Netflix is an online shop
Netflix is an online shop for video on demand. The recommender system that Netflix uses for video recommendations is very similar to the ones used for product recommendations at online shops for physical (as opposed to digital) goods: both mostly rely on implicit feedback.
When a visitor looks at a product (or views a video at Netflix), this is taken as a hint that this visitor is interested in this product. Not visiting a product is taken as a (weak) signal that the visitor is not interested in the product.
In a recent blog post, Netflix mentions a number of challenges related to the fact that Netflix is available in many countries:
- Not all videos are available in each country. Each country has its own libraries/catalog
- Different languages have to be supported
This also poses a number of interesting questions for the recommender system: To what extent are users defined by their country? E.g. should viewers in India receive mostly recommendations for Bollywood movies and viewers in Argentina mostly recommendations for Argentine movies? But what if an Indian and Argentine viewer both watched a lot of Sci-Fi: would their recommendations have to be similar?
All of these problems are also faced by other online shops as well as by the recommender systems running on these shops. Especially in Europe, many shops sell products in many different countries, and usually the product catalog is different for each country (different products, titles, prices, currencies, languages). The shop's recommender system has to take all of this into account. If a visitor is shopping on the Portuguese version of a multi-country online shop, the recommender system should display only products available in Portugal, with titles in Portuguese and prices in euros.
However some versions of the site might receive very little traffic, which could lead to low-quality recommendations. For example, it is possible that the Portuguese version of the site receives very little traffic, whereas the German version of the same site receives a lot of traffic. The recommender system should be able to combine and exploit the data collected on both the Portuguese and the German versions of the site in order to provide accurate recommendations on the Portuguese site, even if the product catalog is not exactly the same in both countries.
Netflix has made extensive use of recommender systems throughout the evolution of the company from renting DVDs-over-mail to serving VoD. Netflix faces the same challenges as any online shop when it comes to offering the highest quality recommendations to their users.
Recommender systems that can specifically help every single user find movies or products he is interested in are hard to build. They have to deal with different catalogs, types of feedback data and user profiles (tastes, geolocalization, etc...). In one of our next blog posts we will look at how such recommender systems are designed.
Martin Piotte, and Martin Chabbert: The Pragmatic Theory solution to the Netflix Grand Prize ↩
Yifan Hu, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Eighth IEEE International Conference on Data Mining (ICDM'08). ↩