exodus-android-app icon indicating copy to clipboard operation
exodus-android-app copied to clipboard

Avoid large download and informing the server of the list of installed application

Open tdelmas opened this issue 6 years ago • 6 comments

Today the Application fetch two things:

The list of all applications known by exodus:

https://github.com/Exodus-Privacy/exodus-android-app/blob/5c7e021ad435c160601b055fe028bdb3723c106c/app/src/main/java/org/eu/exodus_privacy/exodusprivacy/manager/NetworkManager.java#L216

https://reports.exodus-privacy.eu.org/api/applications

{"applications": [
    {
      "id": 50261,
       "handle": "org.eu.exodus_privacy.exodusprivacy",
       "name": "Exodus Privacy",
       "creator": "Exodus Privacy",
       "downloads": "10,000+ downloads",
       "app_uid": "B6FECF6541A4C151B6FB1AB1D77AD012C95349DF",
       "icon_phash": "93967134568205988083521687304934410531",
       "report_updated_at": 1566315960.731061
  }
]}

For each installed application on the phone, the list of reports:

https://github.com/Exodus-Privacy/exodus-android-app/blob/5c7e021ad435c160601b055fe028bdb3723c106c/app/src/main/java/org/eu/exodus_privacy/exodusprivacy/manager/NetworkManager.java#L271

https://reports.exodus-privacy.eu.org/api/search/org.eu.exodus_privacy.exodusprivacy

{
  "org.eu.exodus_privacy.exodusprivacy": {
    "reports": [
      {
        "downloads": "10,000+ downloads",
        "creation_date": "2019-07-30T21:23:18.334Z",
        "updated_at": "2019-08-20T10:34:49.189Z",
        "trackers": [],
        "version_code": "7",
        "id": 87886,
        "version": "1.2.0"
      }
    ],
    "name": "Exodus Privacy",
    "creator": "Exodus Privacy"
}

This has two drawback:

  • The first download is a large json file
  • The second one reveal the list of installed application to the server

These two dowload could be combined, to reduce the size of the download while protecting the privacy:

A single entry point:

https://reports.exodus-privacy.eu.org/api/search/<SHA256>

Where <SHA256> is the beginning (4 char for example) of the SHA256 of the app id (org.eu.exodus_privacy.exodusprivacy) The server answer with all application (with reports) that match the begining of that checksum:

{"applications": [
    {
      "id": 50261,
       "handle": "org.eu.exodus_privacy.exodusprivacy",
       "name": "Exodus Privacy",
       "creator": "Exodus Privacy",
       "downloads": "10,000+ downloads",
       "app_uid": "B6FECF6541A4C151B6FB1AB1D77AD012C95349DF",
       "icon_phash": "93967134568205988083521687304934410531",
       "report_updated_at": 1566315960.731061,
        "reports": [
        {
            "downloads": "10,000+ downloads",
            "creation_date": "2019-07-30T21:23:18.334Z",
            "updated_at": "2019-08-20T10:34:49.189Z",
            "trackers": [],
            "version_code": "7",
            "id": 87886,
            "version": "1.2.0"
        }
        ]
    }
]}

The application could decide the length of the beginning of the checksum, to balance download size vs privacy: a length of 1 will strongly protect the privacy but will download 3% of the database. The full length will download only one application but will reveal it to the server. A length corresponding to 10-100 applications could be reasonable, and configurable by the user.

tdelmas avatar Sep 15 '19 20:09 tdelmas

Only installed applications analysed by Exodus and installed from Google Play download report and may be known by the server. Each time a random number of apps are add to make some noise for the server.

What you propose is good for privacy, for the processing time this will be long. And the large amount of data will always be there in some times if it's not already there. Each new report (new analysis on a new version of an app) add an extra amount of data and download many report not related to the user make the amount of useless data grow significantly.

Our app is not for tech guy that fully understand how to configure exactly their privacy so the length configuration for the api is not in my opinion an option to add to the app.The app should stay very simple for all users.

And in fact, i think that if we really want to know what app you have on your phone, this not protect as much as you want. We can guess what you have with the hash + the probability of the app you may have install by using the download number of these apps and with the other app you may have downloaded. it will not be as accurate as we can do if we want, and we don't want to know what you have on your phone.

Schoumi avatar Sep 16 '19 08:09 Schoumi

  • With the current solution, if the application send for example 50% fake requests the server knows the client has 50% of the applications requested. It know each application has 50% chance to be on the client.
  • With my proposal, if the client ask for a short hash, that represent for example 100 applications, the servers knows each application has 1% chance to be on the client.

In both cases, "the probability of the app you may have install" can be used. So it's a real gain in terms of privacy

With my proposal, if necessary, the client can send fake beginning of hash to confuse the server even more.

And when I say "the client", I am not talking only of this android client, but anybody who wants to build one. It's better to offer the possibility, but I agree, for this android app, we don't need to show the configuration "length of the hash", we just need to choose one that both respect the privacy and avoid downloading the entire database!

tdelmas avatar Sep 22 '19 21:09 tdelmas

This seems to be a nice model to me (close to some differential privacy if I remember well my lectures) With 2 characters (uppercase, lowercase, digits) we already 3844 have combinations. With the current number of apps it returns an average of 16.67 apps per combination. However, with 3 characters, it is over : 0.269 apps per combination for now. So 2 characters would be enough. Or we use only lowercases characters?

Maybe the easiest way of deciding would be to create a nice table with all the possibilities so we can decide ?

Gu1nness avatar Sep 24 '19 11:09 Gu1nness

This seems to be a nice model to me (close to some differential privacy if I remember well my lectures) With 2 characters (uppercase, lowercase, digits) we already 3844 have combinations. With the current number of apps it returns an average of 16.67 apps per combination. However, with 3 characters, it is over : 0.269 apps per combination for now. So 2 characters would be enough. Or we use only lowercases characters?

Maybe the easiest way of deciding would be to create a nice table with all the possibilities so we can decide ?

Side question, is there a risk that hashes are not evenly spread, leading to some specific ones failing on the privacy (eg. ab match 40 applications but xy only one)?

Porkepix avatar Sep 24 '19 11:09 Porkepix

That's what I want to check with the current DB. Maybe tonight. It shouldn't be long to compute and to get the stats. I hope that it will be evenly spread, but I cannot be sure, a sample of ~64k strings is not a lot.

Gu1nness avatar Sep 24 '19 12:09 Gu1nness

OR, we could make it so the entire exodus database gets uploaded to multiple file hosts every month, and then offer the option to download it locally to the users who ~suffer paranoia~ want more privacy. Overloading the servers with unnecesarry traffic is unnacceptable when a better technical solution is available.

As of writing this, exodus says that it has "261180 reports for 127692 applications". An empty report is 394 characters,

394 x 261180 = 102,904,920 bytes = 98.1377792 Mebibytes with a generous estimate that the average full report weighs six times that ammount: 98.1377792 x 6 = 588.826675 Mebibytes

#85 is also a valid option that doesn't require any servers, besides the initial download of the code and network signatures.

ItsIgnacioPortal avatar May 13 '22 01:05 ItsIgnacioPortal