Data obfuscation: if you can’t avoid being tracked, cheat the algorithm
Living in the digital age means being continually tracked. Our daily activities leave a footprint in cyberspace that, properly processed, helps companies sell their products to us. The implicit agreement is as follows: users have free access to companies collecting our data in exchange for the promise of receiving better services. But not everyone is willing to lightly give away their personal information for other companies to profit from it. Knowing everything about us can violate the privacy of individuals, and it can also be annoying to be persecuted by advertisements and offers at all hours.
Is there a way to escape the scrutiny of tech companies? It is very difficult, especially if we use their services on a daily basis. But there is a way to express rebellion from within the system: offering incorrect data to confuse those who collect it. A simple way to do this would be to give a false name and address when you are asked for that information to register on a web page; another more elaborate would be to use special programs that automatically click on all banners offered to us, in a way that makes it difficult to infer our tastes. This is called data obfuscation and, if practiced by a sufficiently large critical mass of people, the big data it would end up being irrelevant. Because algorithms, without quality data, are useless.
The industry is aware of the power of this practice. So much so that the consulting firm Gartner included the algorithm hacking (so it is also called) as one of the five keys to follow in 2020 in your annual trend report consumer for marketers. It is unclear how many people practice data obfuscation. According to Gartner analyst Kate Muhl, around a quarter of consumers do so in their day-to-day lives, even if it is by using a false name when filling out an online form.
Nor can it be established what proportion of false data would be enough to collapse the system. The size of databases and the nature of machine learning algorithms are very different from each other. A simulation carried out with a movie recommendation algorithm led some researchers to conclude that if 30% of users contributed false data to the system, its effectiveness would drop to 50%.
More boredom than activism
Despite having a pompous name (data obfuscation or algorithm alteration), providing false information is not a recent invention. In the military sphere, for example, it has been practiced since ancient times to confuse the enemy. “Lying is very intuitive, almost intrinsic to the human being. Many countries even admit a right to lie. In the United States there is the First Amendment and in some legal systems it is allowed to lie in a trial to defend oneself “, explains the philosopher Carissa Véliz, who in her book Privacy is Power (Bantam Press, 2020) places data obfuscation as one of the courses of action to protect our privacy.
“Data obfuscation is much more widespread than we think,” says Gemma Galdón, director of the Eticas Consulting algorithm audit. “What happens is that more than as an act of activism, it is done out of discomfort or boredom. And that happens because at some point the public broke their trust in technology ”, he adds. Lying when filling out the registration form for a store loyalty card or covering your face with your coat when we pass a security camera are two examples of simple and very common actions.
“It is assumed that all the data provided voluntarily by people have a very low degree of reliability. Its market value is practically zero ”, Galdón illustrates. Hence, what is bought expensive is the data obtained from banks or processes in which the citizen does not have the freedom to lie.
The obfuscation of data is the deliberate contribution of ambiguous, confusing or false information to interfere with digital surveillance and data collection ”, define Finn Brunton and Helen Nissenbaum in their canonical work. Obfuscation. A User’s Guide for Privacy and Protest (MIT Press, 2015). A recent study prepared by researchers from Northwestern University (Illinois) distinguishes between three ways to confuse algorithms: with data strikes, intoxicating the data, and contributing data to the competition.
1. Data strike
The first step to erode the power of surveillance capitalism is to refrain from generating data or erasing input. This is achieved by removing accounts from platforms that are not considered essential or by installing privacy tools.
Researchers at Northwestern University cite as an example the boycotts on Facebook promoted in the US by associations in defense of civil rights or Uber when cases of sexual harassment were reported within the company.
2. Data intoxicated
This second form of obfuscation consists of consciously providing meaningless data, which confuses the person who collects it. The easiest way to do this is by lying. For example, liking songs that you actually hate on a music player. Or inventing an email address each time we are required to give one to access a website.
Another more complex way is to dazzle the algorithm: send it a large amount of data, all of it false, so that the profile it builds on us is even more imprecise. There are browser extensions like AdNauseam that do that work for us. Clicks automatically and imperceptibly for the user on all the ads that are shown to us when we browse. The objective is that Google Ads does not manage to process all that information, which will also be erroneous. Another option is to use TrackMeNot, which launches constant and random searches on Google, so that our true preferences are diluted.
Last year, a group of American teenagers came up with an ingenious way to go crazy on instagram. They decided to share a single account on the social network. Every time someone in the group wanted to enter, they only had to ask whoever was connected to restart the session. After that request, the company automatically sends a password to the device from which it is requested. The buddy only has to share that key with whoever wants to join. The result: the algorithm showed photos of Kobe Bryant, baking recipes, cars … Nothing to do with the preferences of any of them.
3. Feed the competition
The third way to deceive the algorithms is to consciously provide data to the competitor of the platform against which you want to protest. For example, uploading photos taken on Facebook to Tumblr or using the DuckDuckGo search engine instead of Google. The goal is to encourage competition between platforms.
The consequences of these actions can have effects on physical space. In the summer of 2019, Uber and Lyft drivers in Washington agreed to raise ride rates, which affects their compensation. The mechanism: simultaneously turn off the applications for a few minutes, so that the algorithm believes that there are few drivers and thus the price of the races rises, and then turn them on again.