SafeTrace icon indicating copy to clipboard operation
SafeTrace copied to clipboard

Deanonymizable/extractable data

Open FishmanL opened this issue 5 years ago • 10 comments

This platform as currently defined allows for extraction of everyone else who's entered's location via search-decision reductions, needs DP

FishmanL avatar Mar 24 '20 18:03 FishmanL

Thanks @FishmanL for your comment. Can you spell out DP for clarity, please?

Would you please mind elaborating on the search-decision reduction argument, and argue for how to mitigate this shortcoming?

Thank you 🙏

lacabra avatar Mar 24 '20 18:03 lacabra

Sure: DP = differential privacy

Search-decision: you repeatedly put in your location at slightly different places to figure out/triangulate the precise locations of every person with the virus, first executing a coordinated attack where you drop new pins at a grid across the place you want to search in order to locate every case.

You can mitigate this by adding some small amount of noise to each person's initial location, increasing the amount of added noise over time

FishmanL avatar Mar 24 '20 19:03 FishmanL

@FishmanL we are currently working on a limited MVP where we can build this. It would be great to add this to the development roadmap and implement as we roll out.

Are there off-the shelf DP libraries in Rust that you can point us to?

cankisagun avatar Mar 24 '20 20:03 cankisagun

None that I know of.

FishmanL avatar Mar 24 '20 21:03 FishmanL

Thanks for your insights @FishmanL. I would like to challenge your assumptions here, because I question that what you propose relates to the workflow we envision, which is as follows:

  1. Users who tested positive add timestamped-locations to the dataset inside the enclave
  2. An attacker who wants to de-anonymize data does not know neither the number of users who have uploaded data nor the number of locations each user has entered. When the attacker queries the enclave for a match, she will get a timestamped location where at least one individual who tested positive has been within a parametrizable radius r (which we can set to not be smaller than a given threshold) and a time interval t (again no smaller than a threshold) for a given time and location, but she will not know if there was an individual or more, nor she will get any other userid for that match.

So my question is how can she leverage differential privacy to obtain any information about any user in the set, if those individuals take the precaution of not including home addresses or other locations that can uniquely identify them by themselves? I understand how DP works, but I don't think it applies in the data flow we are envisioning.

Thoughts?

lacabra avatar Mar 25 '20 05:03 lacabra

A constant circle is no better than a single point, since you can figure out the bounds of the circle with enough queries and just note the center. (in fact, no deterministic answering procedure solves this issue, you need randomness.) Same's true for time.

This doesn't fully allow you to deanonymize users, it just allows you to get exact times and locations (number of individuals is also possible by repeated queries, which'll allow you to split up circles into multiple separate onew). How you get from there to actual users is....let's say 'left to the reader' (in smaller towns it's trivial, in cities it's harder).

FishmanL avatar Mar 25 '20 10:03 FishmanL

@FishmanL would rate-limiting queries and deleting trailing data (i.e., anything older than 14 days) reduce the risk here? As I understand it in your model, an attacker is essentially creating a series of fake data sets and modifying the time and location slightly every time to "scan" for matches. This could be addressed by say, only allowing once-per-day-per-user updates, or possibly trying to ID and limit this behavior in the client code.

ainsleys avatar Apr 08 '20 15:04 ainsleys

I mean, I don't see any way to handle scaling this to lots of users (which is the only way it's really useful) without risking 'an attacker makes lots of fake users that are near each other'

FishmanL avatar Apr 08 '20 15:04 FishmanL

Yeah, it's worth looking into what the best options are for making it difficult or expensive to create a ton of fake users without compromising privacy. We could require sign on with some service that provides a layer sybil protection.

ainsleys avatar Apr 08 '20 15:04 ainsleys

#43 for @FishmanL current work on this.

ainsleys avatar Apr 09 '20 18:04 ainsleys