SafeTrace icon indicating copy to clipboard operation
SafeTrace copied to clipboard

Global view SafeTrace algorithm

Open cankisagun opened this issue 4 years ago • 5 comments

This issue builds on issue #2

In order to achieve a certain level of privacy, we propose to run a lightweight clustering algorithm inside the SafeTrace secure DB and passing results to the maps API. This prevents any specific user data to leak to the map API.

The clustering algorithm should return << lan, lat, weight >> to the maps API for each data point to be represented on the map

Next steps:

  • Determine how to account for high risk symptoms in the global view algorithm (Suggested approach: overlay both positive covid19 and high risk symptoms on the same map with different colors)
  • Determine clustering logic based on sample data size (i.e. create 100 clusters for 10K users vs. 5 clusters for 10 users)
  • Assess Differential Privacy options

UPDATE: The clustering algorithm runs inside the enclave similar to individual reporting algorithm, which can be found here

cankisagun avatar Mar 30 '20 22:03 cankisagun

  • I agree on the positive vs high risk in different colors. While there is correlation, I expect that if you drew the curves, you'd see changes in high risk frequency for a given area not to be in the same time step as positive cases. Visualise it, and you may find a leading indicator relationship (I'd consider the first and second derivatives wrt time). You might even be able to develop a predictor using a regression model.
  • The clustering algorithm is simple: duplicate points according to weight, the run k-means (easy with cogset in Rust).
  • Choosing the number of clusters is not an easy problem to solve, and generally is difficult to do without a human present. Silhouetting is the most straightforward way to automate cluster number choice, though you can also run through many numbers of clusters and minimise MSE while not having too many clusters (elbow method). These approaches typically take a lot of effort to program reliably, and a human observing the data based on context tends to be the better option. I would allow the user to input the number of clusters, and maybe display a number selected by yourselves by default.
  • In terms of preserving privacy, it's pretty easy: output cluster centroids, rather than the data points.

theoturner avatar Apr 01 '20 12:04 theoturner

Thanks @theoturner for your outline and proposed solution. It makes sense at a high level. Given that I sense your familiarity with Rust, would you be in a position to submit a PR to introduce these improvements?

Alternatively, you could post small snippets of code as per your comments to this thread, and someone else could "glue" them together in fully functioning code and submit a PR instead.

Thoughts?

lacabra avatar Apr 03 '20 16:04 lacabra

K-means in Rust for lat/long:

We assume that lat/long is representative of a flat plane. For most cases this is fine, particularly smaller surface areas (@akhil325 could potentialy comment further). I've personally never had problems with this abstraction and clustering, and given the ongoing dispute about 2D projections of the globe, It's likely not worth your time to consider projections.

Assume you have locations:

struct Location {
    latitude: f64, // Or whatever numerical type
    longitude: f64,
}

Cluster (assume argument num_clusters e.g user-supplied - I'd advise this before attempting silhouetting):

use cogset::{Euclid, Kmeans};
let mut eucvec: Vec<Euclid<_>> = Vec::new();
for point in &locations {
    eucvec.push(Euclid([point.latitude as f64, point.longitude as f64]));
}
let kmeans = Kmeans::new(&eucvec, num_clusters as usize);
let clusters = kmeans.clusters();

Outputting just the centroids:

let mut clustvec: Vec<(f64, f64)> = Vec::new();
for result in &clusters {
    clustvec.push(((result.0).0[0], (result.0).0[1]));
} 
format!("{:?}", clustvec)

theoturner avatar Apr 06 '20 10:04 theoturner

Drawing it on Google Maps

I also see you've linked an issue re: google maps. Cluster data is relatively easy to draw on maps, e.g. in React.

First, let's create a drawable var of our points, which you could extract from your props, e.g. say you had prop clusters imported from your Rust:

const Point = ({ text }) => <div>{text}</div>;
var points = [];
for (var i = 0; i < this.props.clusters.length; i++) {
    points.push(<Point
        lat={this.props.clusters[i][0]}
        lng={this.props.clusters[i][1]}
        text="X"
    />);
}

Then, draw a google map:

import GoogleMapReact from 'google-map-react';
<GoogleMapReact
bootstrapURLKeys={{ key: process.env.REACT_APP_API_KEY }}
defaultCenter={{ lat: 51.507221, lng: -0.127600}}
defaultZoom={11}
{points}
</GoogleMapReact>

theoturner avatar Apr 06 '20 10:04 theoturner

For more details, see my implementation of clustering + maps for Enigma, APEX. In this implementation, the contract only stores ints and uses enigma macros such as eformat!, there's some input sanitation, and the google map is drawn with a custom theme.

theoturner avatar Apr 06 '20 10:04 theoturner