Primer on Decentralized Contact Tracing

Background information for the discussion at the MyData vs COVID-19 Wednesday call on 2020-04-15

Manu Eder (TCN, COVID Watch)

Why Contact Tracing?

What are our options with respect to COVID-19?

  1. Get it over with fast.
    Consequences: 2/3 of the population get COVID-19. Lots of people are sick at the same time.
    Lots of people die because the health care system is overwhelmed and no one can take care of them. Hopefully 2/3 of the population are now immune, which will stop the spread. (Currently it is unclear thoug how long immunity will last.)

  2. Get it over with, but slowly enough so as to not overwhelm the health care system.
    Consequences: 2/3 of the population get COVID-19, but not everyone at the same time.
    Fewer people die, but lockdown measures will have to be in place for over a year (depending on healthcare capacities). Again, what if immunity doesn’t last longer than a few months? Maybe there’ll be a vaccine at some point?

  3. Do not get it over with, that is, do not aim to infect 2/3 of the population, instead trying to contain the spread. The goal is that as few people as possible get sick. But how is this possible? There will always be new cases somewhere. Even if we manage to get numbers down at some point, how do we stop everything from starting anew?

Contact tracing is one tool that tries to help solve this problem. Other measures will be necessary as well if we try to aim for the last option, option 2.

Marcel Salathé (DP3T) on Twitter:

The idea behind contact tracing: find the contacts of an infected person - they may have been exposed, and should go into quarantine. I'm reposting the illustration that @ncasenmare made 2/12

— Marcel Salathé (@marcelsalathe) April 4, 2020

Contact tracing is most attractive if we are able to bring down the number of infections in some region to a small number once through other measures. Then we can realistically expect to be able to follow up with every single infection that shows up and break transmission chains before there are lots of infected people again. Some infections will go undetected intially, but through following up with all infections that are noticed and testing people we can hope to also eventually discover asymptomatic spreaders.

If we are in a situation where tens of thousands of infections go completely undetected and there are thousands of infections that we cannot follow up on because we lack resources, then contact tracing has much less of an impact.

Here we will focus on the optimistic scenario that we are first able to bring down infection numbers through a combination of measures (most likely including tight lockdown for a month or two). This will influence some choices in the discussion later on. If you do not believe that this scenario is realistic, then not all of those choices will make sense.

Why decentralized contact tracing? What does that mean?

By decentralized contact tracing we mean a solution to the contact tracing problem which does not require a single central authority to know where everyone is all the time. Such a solution is possible, and we think it’s easiest to build it around the Bluetooth technology built into almost all modern smartphones.

Why contact tracing based on Bluetooth?

GPS is relatively coarse-grained, doesn’t work in the Metro, in Buildings, etc. GPS data is also hard to anonymize / process in a privacy-preserving way. Uses quite a bit of battery.

Bluetooth has relatively short range (about 10m, depending on conditions) and allows devices to broadcast messages to nearby devices. With some calibration it’s probably possible to detect only devices which are approximately in the 2m range. Ideal building block for an app-based contact tracing solution.

(Ultrasound would be another option. This would allow quite precise distance measurements based on time-of-flight. This option hasn’t been explored so much. I suspect it wouldn’t work because of technical restrictions in current smartphone operating systems.)

How will such a system work?

On a very high level:

  1. Everyone installs an app on their smartphone that constantly communicates with all the surrounding smartphones with the same app installed.

  2. The app records some kind of information about the surrounding phones that will make it possible to notify these people later on.

  3. If a user gets sick, inform all the users whose smartphones were close to this user’s smartphone. Ask/instruct them to self-quarantine and contact health authorities so that they can get tested.

How should we fill out the details in such a way as to create a privacy-preserving system?

What are the main dangers of a Bluetooth-based contact tracing solution?

I think most people who are in favour of decentralized solutions are most worried about two scenarios that might arise if Bluetooth-based contact tracing is implemented without thought to privacy. (In the description we will call the person who tries to do something that we think is bad and they shouldn’t do the “adversary”.)

  1. Worry #1: An adversary will be able to track the movements of all the users of the app.
    Using Bluetooth instead of GPS already makes this harder for them, because Bluetooth doesn’t inherently contain location information. Still, someone who places Bluetooth devices in many spots around a city may be able to record communication between devices and recognize the same the device in multiple places.
    Even if it is not know “who” a device belongs to; where that device went to and when will tell the adversary a lot about its owner, and with other contextual information will easily reveal identities.

  2. Worry #2: An adversary will be able to record big parts of the “contact graph” of app users.
    (Below is the best picture I could find on Google for “contact graph” (from here).)
    Imagine that the dots represent devices and edges represent contacts between devices. Shorter edges represent repeated and longer contacts, longer edges represent brief or one-time contacts. Initially you might not know “who” a certain dot is. But imagine that you can find out through some other source of information that two of the green dots in the cluster in the middle are employees of the same company. Now you can guess that all of the green dots in that cluster are employees of that company. Now, from their connections, you find the families of those employees, and the companies that they work for. And by building on information that you already have, you can learn more and more about who the dots in your graph are and who they are in contact with.

 

Proponents of decentralized solutions believe that it is very important to prevent these two scenarios, because they give a single entity a lot of information about a lot of people - and therefore also a lot of power.
We are aware that we live in a world where big companies and governments are already collecting this kind of data in big quantities. We don’t want to help them even more.

There are a lot of other things that you should also think about. We just think that as a society these are the two things to be most wary about.

How?

This is again a comic by Nicky Case who has created these illustrations for DP3T:

This system prevents Worry #1 by constantly changing the random message that is broadcast. An adversary who sees the same device in different locations will not know that it is the same device.

An adversary who places Bluetooth devices around the city will be able to tell if infected people have walked past by checking the list published by the hospital. This will be possible in any system which notifies people when they have been close to an infected person. You could always just leave lots of smartphones all over the city. Ideally, the adversary will still not be able to tell whether the infected person their Bluetooth listening device picked up in one location was the same person as the one it picked up in another location. In the protocol described in the comic this is probably the case as long as there are enough infected people on every day. If only the messages from a single person are added to the list on a certain day, then of course the adversary knows that all the messages are from that single person and could track that single person in areas where they have Bluetooth devices.

This is not ideal, but it cannot be avoided completely and from a global viewpoint it is much better than the adversary being able to track everyone. There is much less to be gained from this from a power perspective than from being able to track everyone.

The system prevents Worry #2 by only sending the “what I said” list to the hospital. These are completely random messages, and the hospital has no way of knowing that someone else heard these messages.

There are a lot of ways of getting this part wrong. Imagine for example that instead of sending only the “what I said” list, Alice also sends the “what I heard” list. And maybe Bob also gets diagnosed. When Bob uploads his “what I said” list, the hospital can find some of the messages from that list in Alice’s “what I said” list and deduce that the two met each other.

What would a centralized Bluetooth contact tracing system look like?

In fact, the centralized Bluetooth contact tracing systems already deployed in some countries (for example TraceTogether, which was developed by Singapores government) work something like that:

They start out like in the comic. But in their system all users register an account with the government server and the server knows all the messages that everyone is going to say. When someone gets sick, they upload all the messages they have “heard”. The government server knows who “said” those messages and can contact that person directly. In this system – by design – the government server knows all contact events where at least one of the involved people is sick, i.e. it knows who Alice met and when. And, if the government wants to, it can place Bluetooth devices around the city, and now it can track everyone’s movements, because for every message that one of those Bluetooth devices picks up, it knows who said that message.

What are TCN, DP^3T, PACT, STRICT, Whisper etc.? Why are there so many protocols?

The comic gives a pretty good general idea of how a decentralized contact tracing system can work. Still, there are a couple of details to be worked out still. Depending on how many new cases there are per day, the list that the hospital keeps in the comic can get quite big - it has to keep fourteen days worth of random messages for every infected user. And every user has to download that list. People might not be happy about downloading so much data.

Many protocols have found the following solution to this problem:
Instead of inventing a completely random message every time, Alice’s phone generates a random key in the beginning and then from that key it calculates messages that look completely random to anyone who doesn’t know the key, but which can be recalculated from the key. When Alice gets sick, she uploads her key to the hospital and when Bob wants to check if he has seen an infected person, he only downloads the list of keys of infected people and his phone regenerates the pseudorandom messages that were generated from those keys.
This requires downloading a lot less data. But it also makes it impossible for the “hospital” to shuffle the random messages of infected people among each other in the database that people download, so now anyone who places Bluetooth devices around the city can group together random messages from each sick person and can therefore track the movements of sick people.

Depending on what your guess is for the expected number of infected people, this may be a necessary tradeoff, because otherwise there is going to be too much data. You will want to improve the situation a little by changing the generating key every once in a while. Maybe every day or every week. Then Alice can reveal only the keys for the last two weeks, so that people can’t track her further into the past than necessary. And maybe the adversary can’t guess that two different keys both came from Alice and so he doesn’t have one continuous track for Alice, but only pieces of one day each.

All of the following protocols propose some variant of this scheme:

The DP^3T paper also calculates that with slightly optimistic assumptions on the number of new infections per day (they assume 2000 infections per day in a small country like Switzerland) and if you make the random messages as short as you can without risking too many phones randomly saying the same things1 you can actually fit the full daily database into 5.5MB, which most people are probably going to be ok with downloading.

Why are there so many of these protocols that are so similar?

I think the answer is just that everyone started working on the same problem at the same time, and there aren’t really so many different possible solutions to the problem, so many people found very similar answers.

There are also a good number of lower impact attacks that adversaries could come up with that you should do your best to prevent when designing a protocol, and lots of people have spent many hours thinking about these details as well.

What is the role of Apple and Google in all this? Should we be scared that they have also presented a contact tracing protocol?

Perhaps surprisingly, the answer to the second question is: “Actually, this time round, I think no.”

There is one detail that doesn’t get talked about so much publicly, which is:
Up to now, as far as I know, no one has been able to get this thing working properly and reliably on Apple devices. The reason is that Apple restricts what apps on iPhones can do when the app is not visible in the foreground. Even though the documentation says that iPhones can use Bluetooth even when in background, this just doesn’t quite work as advertised. The bottom line is that when two iPhones with our app running on them both have the app in the background, they will not detect each other – at least sometimes. This is on top of many other undocumented or badly documented problems with how Bluetooth is implemented on iOS and Android devices. This leads to solutions like the one in the TraceTogether app from Singapore, which asks users to simply always leave the app in foreground. Some apps even create “fake lockscreens” so that you can leave the app in foreground and still put the phone in your pocket. We were considering doing complicated workarounds where in a situation where there’s multiple iOS devices and at least one Android device close to each other, the Android device would tell the iOS devices about each other. (Connections to Android devices happen to work, even when the iOS device has the app in the background.) None of this is very satisfactory.

Many people have been bugging Apple (and to a lesser extent Google) to please fix these flaws in their Bluetooth implementation, or to provide some secret workaround to them.

Now a few days ago they surprised everyone (or me at least) by coming forward not with a promise of a simple fix to the Bluetooth problems, but instead with a promise of providing big parts of their own contact tracing solution at the operating system level.
Their proposed solution is practically identical to TCN/PACT/STRICT/Whisper/“low-cost” DP^3T.

If you look at the API (Application Programming Interface) description in their announcement, this is what they want to implement at the OS level:

What is left to apps to implement is interacting with the user, downloading “infected keys” from an app/country-specific server, and reporting user infections. Neither Apple nor Google provides the server.

Interestingly, they do not allow apps to directly access the “heard” messages. This has the consequence that anyone who wants to use the API they provide is not left with much choice beyond actually using the exact protocol they propose.
In particular, centralized contact tracing protocols cannot be implemented on top of the Apple/Google API.

Hiding the local database of “heard” messages from apps also has the effect of making it much harder for the owner of the device to extract these messages. This addresses one worry which proponents of the centralized model had about the decentralized model. They feared that for example a nasty boss would look at his database of “heard” messages to find out the exact time when he was close to someone who had COVID-19, and then identify and fire that person.

So, at least currently, it looks like in this they have used their power to force a decision in our favour.

Of course this isn’t reason to trust them completely.

Things to be watchful of:


  1. which would lead to people getting false positive alerts because they “heard” someone else say the same random message as an infected person↩︎