In this blog post, I want to shed some light on how your personal data is used on the web for personalization and advertising purposes. It is a topic which is often misunderstood due to its ever growing complexity. As a consumer, it is rarely clear what happens to the data you generate while browsing the web, and even less clear how it is used for advertising purposes.
At TasteHit, we develop a personalization layer for online shopping, which is similar to advertising in terms of technology. In my role as CEO I had the chance to work with many different online shops on solving their data-related challenges. Based on my experience, I will first try to explain the data collection mechanism and relate this to the technology used to understand this data.
The importance of data
In our latest blog posts we looked at the science behind intelligent algorithms used in different contexts such as Google's AlphaGo or DeepArt's artistic filters. The common ground between these projects is the use of predictive data-driven algorithms. Designing a learning algorithm to achieve a specific task is complex, but since these algorithms have to be trained on huge amounts of data, the biggest challenge is to find enough data for training. When it comes to machine learning, the more data the algorithm can be trained on, the better it will be at achieving the goal it is trained for.
This is the reason why personalization and machine learning go so well together: it is relatively easy to collect large amounts of anonymous user behavior data on websites. Once collected, this data can be analyzed using machine learning and the browsing history of one single user can be compared to the browsing behaviors of millions of other users to make predictions and create personalized experiences. This is called "collaborative filtering" and is often used in such tasks as generating personalized recommendations on shopping sites.
A "personalized experience" can take different forms even though the type of technology and algorithms applied on data may be similar. Whenever you read your newsfeed on Facebook or Twitter, watch videos on Netflix, or scroll through products on Amazon, the choice of the content itself and the order in which it is presented to you will depend on your past behavior on the service you are using: last videos watched, previously visited products, your localization parameters, etc... Two different persons logging in to Facebook will see completely different content, and each of the now 1.7B Facebook users will have a personalized view of "their Facebook", based on the data Facebook collected about them.
The more a service is tailored to your interests, the more you will use it, and the better this service will become. However personalization is also used for other purposes.
The business model of the web
At TasteHit we often talk about the role of "personalized product recommendations" for online shopping. When we mention this term to people outside the tech world, most of them systematically think of targeted ads: "Last time I had a look at this pair of shoes on site X, ads for this exact pair have been following me around the web for weeks!".
If it is done the right way personalization features are generally perceived as useful and pleasant for the user: Music recommendations on Spotify, movie recommendations on Netflix, product recommendations on an online shop. Sometimes it is the exact opposite: personalized ads, for example, are often (but not always1) perceived as intrusive and annoying. We only have to look at the adoption curve of ad blockers to understand that this is an ever growing sentiment among consumers.
The collection and usage of personal data by for-profit companies as well as public organizations is becoming a rising concern among consumers. One of the key events that drew international attention to the topic of privacy and personal data is when Edward Snowden shed light on the NSA mass surveillance programs in 2013.
Years ago people used to trust companies and governments to respect their digital privacy and restrict the use of the data they collect for the only goal of offering a better service. Nowadays the value of personal information is starting to be understood by consumers.
When digital privacy started becoming important in the eyes of consumers, one word became the scapegoat of all privacy-related accusations: "cookies".
Before continuing with the general comparison of personalization and advertising, I have to make a technical break and talk about "cookies". Cookies are the cornerstone "technology" (if we can really call it so) used for collecting consumer click data in personalization and advertising, but many people are still confused about what cookies are and how they are used.
Simply put, a cookie is a small text file stored by your browser on your device. If you want you can easily open it with Notepad and look at its contents. The role of cookies is to help websites you visit "remember who you are".
When you visit one page of a website, you establish an HTTP connection with the server hosting the website and download the HTML page from this server, say the product page of a shovel on the online shop MyGardenTools.com.
Now imagine that you add this shovel to your cart by clicking on the "Add to cart" button of the site, and then go to the home page of MyGardenTools.com.
There is no way for the server to know that your computer or mobile phone is the one who opened both: the shovel page and the home page. That's because HTTP, the protocol used to request and download the HTML page to your device is stateless: it doesn't require that the MyGardenTools.com web server saves information between the two requests you sent.
This is what cookies are for: to identify you between two consecutive requests. When you visit the first page, the server will send you a cookie with a unique identifier along with the HTML page. This cookie will automatically be sent back by the browser first, with the add-to-cart request, and then, with the request to load the home page.
The server will receive the request, along with the cookie giving an anonymous (but personal) information about your device. Something like this: "Load the home page, and make sure it is customized for ANONYMOUSUSER123456". But why does the server need to know you are ANONYMOUSUSER123456? Well, you just added a shovel to your shopping cart, and you expect the server to remember it. When you go to the home page, you want the little shopping cart icon in the top-right corner of your screen to show that you have one product in your cart. And if you click on the icon, it should open your own personal cart with the shovel you just added. None of this would be possible without cookies.
So why do people say that cookies are evil? Let's just continue with our example. You just added your shovel to the shopping cart of MyGardenTools.com. Let's now imagine you navigate to your favorite blog: "DeliciousRecipes.com" and start reading today's recipe. In the right columns, you see the shovel from MyGardenTools.com you just visited appearing in an animated ad, along with other products from MyGardenTools.com. Surprize: this is not a coincidence, and from the previous paragraph, you probably suspect that this is somehow related to cookies. Bingo!
This situation involving a personalized ad is almost identical to the previous example, with a twist. In the first add-to-cart case, MyGardenTools.com sets a cookie in your browser to track your activity on their own site. This type of cookie is called a first-party cookie. The latter shovel ad situation is only possible because MyGardenTools explicitly asked a third-party company to place their cookie in your browser. Again, the cookie is a small text file, which will contain a unique identifier (say TRACKEDUSER424242) so that when you navigate to DeliciousRecipes.com (who is a partner of the third-party company) the same company can say: "Oh, I know user TRACKEDUSER424242, he recently saw a shovel. Let's show it to him now in an ad. It is quite likely that he clicks on it". This is what is called retargeting.
(for the record: this is a simplified general description of third-party cookies. In reality the mechanism is often more complex as several companies map and exchange cookie information to track a single visitor across sites)
Personalization in advertising
We just touched on an important topic: the power of personalized advertising and its implications for privacy. It is common knowledge that advertising is the business model that powers most sites on the internet. The fact that you can read recipes on DeliciousRecipes.com for free is paid for by MyGardenTools.com and other advertisers. These advertisers pay for ads to reach more customers or get existing ones to come back more often.
What is often less clear are the mechanics and parties that participate in this advertising scheme. One company, which is one of Europe's most famous startup success stories is called Criteo. Criteo is the leader in retargeting. As you probably understood, Criteo is the "third-party company" in our previous example and their goal is to show you a personalized ad to get you to come back to MyGardenTools.com. Of course their technology evolved a lot and can now do much more complex things than the previous shovel example but the base concept of showing on one site (publisher) a product you already saw on another site to get you to come back still holds.
Where is the money?
Let's take a look at the financial model behind this scheme: who pays whom and why? Let's say that you see the shovel ad from MyGardenTools.com on DeliciousRecipes.com and decide to click on it. When you do, you get redirected to MyGardenTools.com. For every such click, MyGardenTools.com pays Criteo for displaying the ad to the right user at the right time, and Criteo pays DeliciousRecipes.com for "renting" their advertising space. But what if the user doesn't click? In that case MyGardenTools.com doesn't pay Criteo, but Criteo still has to pay DeliciousRecipes.com for displaying the ad! The pricing model between MyGardenTools.com and Criteo is called cost-per-click/CPC: MyGardenTools.com only has to pay Criteo if someone actually clicks on an ad.
The goal of the retargeting company (e.g. Criteo) is therefore to maximize the number of clicks on the ads it displays for its customers (advertisers) and get rewarded for each click, while at the same time buying advertising space as cheaply as possible on publishers' sites (often media, news, blogs and more generally "content sites" which make a living out of ads they display). To achieve this goal, the retargeting company has to outbid other bidders in a real-time bidding (RTB) process, in which the highest bidder for an advertising space wins the right to display the ad and pays for this right (usually the value of the second highest bid). This means that ads are bought and sold in real-time, with floating prices. Also, advertisers (such as MyGardenTools.com) don't need to make separate deals with each publisher they want to work with: middle men such as Criteo ensure that they reach a global market, and that the whole procedure is as automated as possible, which is why this automatic ad buying process is often referred to as "programmatic".
What about the user in all this?
I started talking about offering useful personalized services to users and ended up talking about companies programatically buying/selling advertising space and users installing adblockers. From a pure technology standpoint, these industries are similar: both deal with data collection and machine learning algorithms which predict the next product/song/video one user will click on.
Criteo started as a movie recommender system in 2005, became a B2B product recommendations SaaS in 2006, and pivoted to CPC-based ads - or shall I say "off-site product recommendations" - in 2008, a concept then dubbed "retargeting". But let us take a step back.
What is particularly awkward in the concept of personalized ads and retargeting is that no matter how complex and intelligent the process is for connecting publishers and advertisers, the central part of the system is still the simple fact of a user clicking on a product ad. It is this click that generates money "out of thin air": revenue for both the publisher as well as the retargeting company, and maybe a lead for the advertiser.
The user is not aware of the whole process that takes place: he may not realize that his click generates money and he has never heard of the retargeting company, which tracks his behavior around the web. The infamous sentence "If you are not paying for it, you are the product" takes on all its meaning.
The major problem that most people seem to have with personalizated advertising is not that the ad is tailored to their browsing behavior, but that it is placed completely out of context. Why would you see a shovel on a site for recipes? This feeling of being tracked and not knowing which part of your personal data was shared and with whom is one of the main reasons people are uncomfortable with targeted ads.
Hold on, but then why is everyone comfortable with Deezer, Netflix, Youtube, Twitter, Facebook or Amazon collecting your personal data and personalizing your experience inside their services? They may be collecting a lot of data on your behavior, running algorithms to predict what you will click on next: but "it's ok" as long as the goal is to make your experience inside the service better and this data will not be used for anything else. As a user, you enjoy the personalized experience on Twitter's feed, the Facebook timeline, or Youtube's recommendations sidebar. It wouldn't have the same effect on you if these services were just showing you popular, or random selections of items (posts/videos/songs).
From advertising to discovery
In this discussion about user-generated cookie-based data we mentioned two different use cases:
- Services which have the knowledge and resources to personalize the experience of users by adapting the experience to the users' interests, and helping them discover new content (movies, books, music, news, products). A better, more personalized service will make the user more loyal and engaged with the service.
- Personalized ads attract users to a new site, or remind users of products they previously looked at (retargeting). The increasing number of users of ad-blocking software shows that many people are annoyed by targeted ads.
In both cases the technologies used are similar (collecting click data using cookies, using machine learning to predict interests and buying intent). The financial models and the perception for users of these two use cases are very different.
On a more general level, whether we are talking about ads or recommendations, both boil down to discovery, in the sense of spontaneously finding content you were not initially looking for. With search you can easily find products you are looking for on Amazon and videos of your favorite singer on Youtube. This works well, but you need to know what you are looking for.
When you open your favorite app and scroll through your Facebook timeline, Twitter feed, Pinterest boards, you are not looking for a particular product. You are looking for an experience. And this is a big deal because the role of the service (and therefore the type of technology it uses) changes drastically. A great online shop used to be one with the widest choice of products and a great search feature (as well as fast shipping and good customer service, but this is a prerequisite). Amazon won the race according to these criteria.
But consumers are not visiting a service because they are desperately looking for one particular item anymore. They come to a service because they want the information the service is known for (videos on Youtube, pictures on Pinterest, news on Twitter, etc...) sorted in a way that is relevant to them. And this is a whole new challenge. The profile of the user on the service serves as a set of filters which filter out data which is not relevant to this user. This filtering (or rather, sorting of information) is based on the data collected when the user interacts with the service (friends/followers, clicks on news in a news feed, etc...). The infinite feed is a common graphical representation of this concept: a continuous flow of information tailored to the user's tastes which is automatically generated based on his profile.
On such services the difference between ads and personalized content is extremely thin. Facebook is probably the best example of such a service: it lets a user create a profile which is then used to shape the experience of this specific user on Facebook: both for showing each user his own personalized social network, as well as targeted ads based on the user's account. What is particularly interesting with targeted ads on Facebook is that users can control which types of ads they want to see and which data is used for advertising purposes.
Most research papers on this topic show that personalization combined with user-controlled data sharing policies drive the highest engagement and satisfaction2.
According to its growth, both financial and in the number of users, Facebook seems to have made all the right moves to combine the personalization of their social network (algorithmic feed, social widgets on third-party sites) with their own ad network. One of the key elements of this success is the way Facebook addressed the privacy of their ever growing user base. Facebook ads are part of the experience you get on Facebook. Just like posts from your friends, sponsored posts and sidebar ads are tailored to your tastes, respect your privacy settings, and are based on the personal data you leave on Facebook.
Reinventing personalized product discovery
Whether we are talking about anonymous cookies, or rich social-media profiles, the amount of personal data available on a user heavily affects the experience the user will get. It is the case in the context of advertising, as well as while trying to offer a better, more personalized service.
At TasteHit, we have been working with online shops for several years and got to know their most important pain points. Online shops typically have two major challenges:
- Acquiring traffic: they want to get users to land on their shop. And they often do that using different types of advertising: targeted advertising, mail, paid search.
- Increasing their conversion rate using real-time marketing (such as what we do at TasteHit), AB-testing or live chat. Real-time marketing has a strong impact on the behavior of users, as it helps users discover products and therefore provokes impulse buying.
In both cases personalized discovery is key. Visitors have different tastes, and they are used to services such as Facebook, Twitter, Youtube, which deliver information in a personalized way, based on each visitor's personal data, without even the need for search.
At TasteHit, we help online shops become more successful using personalization. I believe that online commerce is undergoing a transformation, where many ideas coming from social networks (social relationship, infinite feeds, merging advertising and on-site experience, user-controlled privacy settings) can be reused to make the users' browsing and discovery experience more enjoyable.