Privacy Matters

Do I have security and privacy when I use COVID Nearby?

The COVID Nearby platform offers both security and privacy with the strongest possible types of guarantees. COVID Nearby uses the powerful method of differential privacy to protect the information it releases against any current or future attacker.

What is the difference between security and privacy? Isn’t my privacy protected automatically when I use a secure app?

Security concerns the secure user authentication and secure exchange of information. Privacy, however, is altogether a different concept. Privacy refers to the risk that a person is exposed to when she allows others to look at his information. By analogy, think of security as the process for letting only authorized people to enter a building, whereas privacy corresponds to ensuring that nobody other than the person entering the building knows whether she entered the building.

In a symptom-tracking app, I don’t provide my name, and such apps report the symptoms for a group of people, not one person. Then, why is privacy still a problem?

Obvious identifiable information, such as your name or email, is only one piece of information that attackers use to find out who you are. Numerous scientific studies have shown that simply deleting obvious identifiable information does not protect privacy. Privacy experts have analyzed such anonymized data and fully recovered the identities of many users in a database. For example, an attacker may launch an association privacy attack. This attack systematically links the information, which contains no obvious identifiable information, provided by a symptom-tracking app to outside sources of information (that may have obvious identifiable information) and reveals the identities of individuals who reported their symptoms anonymously. Privacy experts showed the power of this attack on anonymized Netflix data. The authors of a vast number of anonymous Netflix reviews were identified by linking the reviews to publicly available IMDB data. Thus, removing obvious identifiers often fails to protect the privacy of individuals. 

You may think that reporting anonymous aggregate information about groups of people (as opposed to information about a single person) protects their privacy. Unfortunately, this isn’t true either. There are many examples of  attacks on privacy that fully recover the identities of individuals whose information was included in the reported aggregates. Here is a simple example. Let’s say your friend John Doe, who is 35 years old, lives alone in an area populated by seniors. Some people in this area use a symptom-tracking app that for a selected area reports symptoms and the average age of people who shared their data. Then, with two queries (i.e., questions) about the average age, one that includes the house of John and one that does not, we can have a pretty good estimate of whether or not John gave his personal information in the symptom-tracking app. With more sophisticated queries, which are still about a group of individuals and not about any single individual, an attacker can reconstruct all the information John provided.

COVID Nearby presents statistics to the users and allows health experts and scientists to do data analysis. As a user, how is my privacy protected?

COVID Nearby builds on the groundbreaking idea of differential privacy, a mathematical guarantee for your privacy. Differential privacy puts together the following idea: analyze the data in a group of people in a way that the data analysis has (almost) the same outcome even if any of the individuals alone were not in the group. By focusing on the group, we protect the individual as long as the correct answer of the data analysis is slightly and randomly altered. Furthermore, by giving less accurate yet useful answers, the algorithm gains the best of both worlds: the result of the data analysis is very close to the correct one and the identity of individuals in the data is also protected. Such privacy algorithms, including those at the heart of COVID Nearby, universally guarantee that from the results of data analytics no attacker will be able to tell whether you had shared your data. When it is impossible to tell whether your data is in the database, it is also impossible to link your data and violate your privacy. For example, if you visited the ER and submitted symptomatic information through the app, it is impossible to link the two.

Why does COVID Nearby emphasize protecting the privacy of its users against all possible attacks? And why should an everyday user care about mathematical privacy guarantees? 

We hear of the privacy breaches all the time. The list of privacy attacks is continually increasing with no end in sight. When your data is protected in an ad-hoc way, there is no guarantee that your data will be protected against a future attack. For example, it is possible that one gets symptom-tracking data today and after a couple of years an attacker is able to identify the individuals who participated. 

Even making assumptions about the attacker’s expertise, background knowledge, or capabilities is dangerous, because attackers exhibit very non-typical behavior (and that’s the whole point). Therefore, to truly protect privacy, it is necessary that the privacy protection method should account for all attacks and all uses of computational tools to realize them. There is no practical way to validate the claims of privacy protection. The only way to guard your privacy, now and in the future, is through a rigorous, mathematically provable guarantee. COVID Nearby relies on exactly this type of mathematical privacy guarantees — it’s not a luxury but a necessity.