You don't have privacy on the Internet

I've realised something disturbing: Potentially, you have no privacy when you browse the Internet (let alone actively communicate with it) - and you can mitigate this only by means which severely impact usability. Here's how it happens, what you can do to protect yourself, and why the onus will always be on you to do so.

The problem

Ad companies claim that online tracking is anonymous. It's not.

The above article by a researcher at Stanford is a great explanation of what is probably the biggest problem with browsing the Internet: Your visit to almost every popular website is tracked by ad networks. This interactive infographic from the Wall Street Journal demonstrates how each visit to the 50 most popular websites in the US is tracked by up to hundreds of elements, and the case with the most popular kids websites is even worse. Here's a list of the top 100 webpage elements used to track you.

Companies often claim the data they collect is "anonymous" because they don't directly record your name or data directly identifying you. This is false - the data is more than enough to uniquely identify you (I'll explain how below). If desired, they can link that data to your "real-world" information - name, address etc - thereby generating a detailed profile of you and your history of browsing, purchasing, and other online interactions. There's a growing market for such services, called "de-anonymizing", a kind of data-mining that turns supposedly anonymous information into real identities.

This is just part of the larger issue of increasingly widespread privacy violations by private companies that have very little accountability.

Customer data is valued immensely by corporations, and you're giving it away constantly just by loading webpages. Imagine if someone read through your browser history every day. They could learn a lot of very revealing information, just from the list of pages you visit - for example, your medical or mental health issues and sexual preferences. Major ad networks have the capability to do that, for the sites their scripts run on - that is, almost all the sites you're likely to visit. They also work to gradually change your behaviour. Do marketing companies and random websites really deserve your trust - that they won't use your data in an undesirable way, or hand it on to third parties? And if they're trustworthy for that (which is doubtful, since they have little or no accountability for how they use your data), do you also trust that they won't be hacked, or subverted by a rogue employee?

As a quick aside: Why should you care? The most common objection at this point is "only people with something to hide (ie. criminals) need privacy". A lot of people seem to really think that it's okay to criminalize privacy, and to look at someone with suspicion because they don't share all their photos with the world on Facebook. This view is very misguided, naive, hypocritical, and ultimately terrifying. This article in The Chronicle addresses it well. There is a basic human need for privacy, whether online or not, and it's not primarily about hiding bad things, but about reducing misunderstanding and abuse. The Urewera terror raids in New Zealand were an excellent, albeit extreme, example of how a lack of privacy can result in the abuse of many innocent people.

How the networks track you, and what you can do about it

You might think that your privacy is protected by virtue of sharing a connection (IP address) with others, or being with an ISP that gives you a dynamic IP address (an address which sometimes changes). Firstly, there are statistical methods to separate users with a known probability of correctness; more importantly, all such protection is disappearing under IPv6, where there are enough addresses for every machine to have a permanent address. But in any case, tracking companies don't even need your IP address to uniquely identify you. They can use your browser.

As a starting point you should block unnecessary cookies and hide your IP address through a proxy or VPN (everyone should be using a proxy or VPN anyway for public WiFi). Even then you can still be uniquely identified in your browser, because websites can re-create any of their cookies that you remove or block. Your browser provides a huge amount of information to websites through JavaScript. EFF's Panopticlick project demonstrates how that information is enough to uniquely identify you. Worse, you can even be tracked using only your browser's cache. The only full protection is to disable caching and completely disable Javascript - which stops most websites from displaying properly and even functioning at all. Tor Browser does all of the above and is widely considered the best way to protect your privacy online - but expect a frustrating experience as your browsing is much slower and websites depending on JavaScript (most of them) fail to work properly.

I compromise and use a raft of browser add-ons and custom settings to make me much more difficult to track. Some of these make browsing more complicated and frustrating, but they also make me less susceptible to hacks and hoaxes. I also use Startpage (scraped Google) to search without being logged or filtered.

So far we're only talking about browsing - if you actively submit any information, privacy gets much more difficult. I'll leave that to my privacy 101 article, though there are some app tips below.

The big picture

So while difficult and frustrating technical defences requiring technical savvy may block most of the holes, other holes are left wide open. The market will never fix itself because privacy is a lemon market. Consumers have no privacy information by which to make judgements. The only way for us to have privacy on the Internet is legal protection. Companies need to disclose how they use our information and who they give it to. They need to be prevented from overriding the requests of users not to be tracked. They need be held accountable for the abuses that happen (never mind their abysmal security which results in huge data sets being stolen on a weekly basis).

While government privacy bodies do some great work, it's like trying to stop the tide. They have few resources, pitted against the standard operating practice of the knowledge economy: an almost complete lack of transparency or accountability around personal data usage. Companies are not going to willingly give up the lucrative benefits of pervasive data-mining, technical tricks to track users against their explicit wishes, secret sharing of data with third parties, and insecure storage (good security is expensive).

Even within governments, the privacy protectors are hopelessly outmatched. Governments are responsible for the greatest privacy abuses of all - particularly military and police, but most departments, because of their wide access, and especially when they share their data - and they are consistently pushing for ever-more invasive ways to collate data and surveille the populace. To adopt the terminology of the Chronicle article: In some countries that data is used to capture and torture activists, promulgate opposing propaganda, or shut-down dissent in other ways (Orwellian privacy abuse), but in all countries the Kafkaesque abuses of bureaucracy, mistakes, and lack of transparency, represent a real problem to your privacy. Just because you haven't seen the effect yet, doesn't mean there isn't a problem. In addition, digital storage means that your data is generally kept forever - the issues with this are numerous, from changes in government leadership (you might trust this government, but how about the next, or the one in twenty years time), changes in laws (e.g. data-mining to identify potential criminals), changes to officials and consultants managing the data, and changes in society (e.g. some behaviours society considers acceptable now will be shocking in the future).

Having actual privacy

Preventing the government from spying on your online activity is harder still: it's possible but you need a good technical knowledge and careful awareness of your exposure. Full hard-drive encryption is a minimum requirement, as is Torbrowser, but if you don't want people or firewalls knowing you're using Tor you'll need a traffic shaper like SkypeMorph which makes your traffic looks like a Skype video call. Against governments, full hard-drive encryption isn't enough - you need deniable encryption, for instance with a tool like VeraCrypt. For communication you need to use Off-the-Record Messaging (for instance using Signal, Jitsi, or Pidgin), friend-to-friend networking and steganography tools like OpenPuff. Of course you can't login to websites like Google and Facebook which provide your private data to governments automatically - so you'll also need an alternative email provider, and alternative social networking software (preferably distributed) - and you need to be very conscious about who you're sharing with, and about separating your private and public identities.

Update 2015: On your desktop PC you need to use Linux or some operating system other than MacOS or Windows, which spy on you. For mobile you need some kind of Android alternative without any Google apps. iOS and Google Play Services track your location at all times, even when location is off or even your whole phone is off (transmitting back to base when you next connect to a network), they also track when you use each app and for how long.

Eventually there may be a redesign of computing with privacy and security in mind, but it would be a very expensive proposition, and Big Tech doesn't have the incentive to do it. Until then you can only do your best to minimize your exposure and educate your friends.

(This post was originally posted to Posterous, and later to Facebook)

gracefool