cwa-app-android icon indicating copy to clipboard operation
cwa-app-android copied to clipboard

[DISCUSSION] Solution without closed source dependencys

Open theScrabi opened this issue 5 years ago • 105 comments

EDIT: tl;dr: We are working an an open implementation of the Google/Apple protocol (I call it PPCP, which may be wrong). You can find and contribute to our repository here: https://github.com/theScrabi/CoraLibre-android-sdk

Dear Corona-Warn-App developpers we have read your code and documentation. We like your effort and your open mind for the community. As there is a small but willing comunity of people who refuse to use closed source Google or Apple dependency and so can not use the Corona-Warn-App, I'd like to discuss the posibility of a Google Free FLOSS solution.

What we know so far is that it's currently hardly possible to create the corona app without the use of the closed source GMS. This comes along with the nearby android exposure notification API, which implements the PPCP protocol. From what we can see in the code so far, the contact points between the android exposure notification API and the Corona-Warn-App are few as for the most common calls to the API a wrapper class is used. That's great! Also there are only about 14 code files that include parts from the GMS library. We therefor think it might be feasible to create a version of the app that does not require GSM services.

As statet in this issue the current maintainer have no plans to implement an own exposure notificatoin API, but might be open to support an alternative if available. What we understand so far is that the part that runs the actual BLE contact tracing as well as providing a highlevel API for exposure likelihood and key handling would need to be reimplemented. We had several thoughts about how this might be possible.

  • As its offtenly staded that PPCP is close to DP-3T, I question if it is possible to reuse some parts of the prestandard DP-3T android sdk.
  • Prestandard DP-3T used forground services. Would it be possible to implement a GMS free non root version that would always display a notificatino while running the advertising and receiving service? Now i know this is not a good idea as people tend to close this notificaion, but was it techinically feacable to replace GMS by such a sollution?
  • A solution that would work on phones with root access such as Lineage-OS or /e/ could maybe be to introduce a privileged extentsion like the one used by the fdroid app. This might make it possible to expose functions to the Corona-Warn-App that would normally only be accessible for higher privileged applications. Maybe this could be a replacement for GMS.
  • One other way was to compleatly break loose from Smartphones and focuse on microcontroller based tracing beacons. For example on ESP32 or the CCC Card10. This way not only users with Lineage-OS or rooted smartphones could profit from an alternative implementation but also people who don't use a smartphone at all.

If a community based solution is possible what would be the next steps to go?


Internal Tracking ID: EXPOSUREAPP-5775

theScrabi avatar Jun 01 '20 09:06 theScrabi

I agree, there should maybe some abstract wrapper class or interface that encapsulates the Google Play Services stuff. Especially as Google already mentioned that the play services stuff may move to core Android (maybe AOSP). Also Huawei may have another variant of the same API without play services on their newer phones. So I tend to think that another abstraction here would be a good idea, so it's easy to add more exposure notification API code. On the other hand: If Google moves the stuff away from Google Play Services on newer devices, I am sure the API will stay the same and Google Play services will delegate to the Andoid layer.

uschindler avatar Jun 01 '20 11:06 uschindler

Here is a good example of a GSM free fork:

https://f-droid.org/de/packages/org.telegram.messenger/

While the original Telegram app uses GSM, in example for location sharing, the fork is 100% GSM free, without losing any functionality (in this case they use Open Street Maps iirc).

Removing the Google dependency could also improve the user acceptance by wiping concerns against US big data companies.

ttobsen avatar Jun 01 '20 14:06 ttobsen

Personally, I can only encourage you to build a libre version of the ENA API for Android. But to be crystal clear: In the foreseeable future the Corona-Warn-App project will rely on the official Google and Apple APIs!

My personal advice to you: Fork this repo, give it a new name (e.g. CoraLibre or CoronaLibre) and integrate it with your libre ENA API. Try to change as few as possible of the upstream code so you can stay really close to this upstream repository. If you have a working fork, completely libre, and some minor changes in the upstream code base would help you to integrate the upstream commits (e.g. by doing some Dependency Injection here and there) feel free to create PRs. We will definitely accept no code changes that have impact on the UX (e.g. the mentioned Android notification hack).

All the best, Malte

PS: Please be careful when publishing your app: Afaik you may not use the Corona-Warn-App name, brand, logo and designs.

MalteJ avatar Jun 01 '20 16:06 MalteJ

In the foreseeable future the Corona-Warn-App project will rely on the official Google and Apple APIs!

Then in the foreseeable future, it will not be possible for everyone to use the app. I'm using a Fairphone 2 with their Google free Fairphone Open OS, no GSM available.

So the question is: Why is it necessary to use the Google APIs for this app? Are there any technical reasons?

I think the most important feature for such an app is, to make it available to the biggest user spectrum possible. But when using Google APIs this target can't be reached. :-(

ttobsen avatar Jun 01 '20 21:06 ttobsen

@MalteJ That is great! Thank you for the reply. So I guess the next thing is to somehow be able to write a libre ENA API that is compatible to Privacy-Preserving Contact Tracing. The question was can it be pulled off and has someone experiment with it already?

theScrabi avatar Jun 01 '20 21:06 theScrabi

In the foreseeable future the Corona-Warn-App project will rely on the official Google and Apple APIs!

Then in the foreseeable future, it will not be possible for everyone to use the app. I'm using a Fairphone 2 with their Google free Fairphone Open OS, no GSM available.

So the question is: Why is it necessary to use the Google APIs for this app? Are there any technical reasons?

I think the most important feature for such an app is, to make it available to the biggest user spectrum possible. But when using Google APIs this target can't be reached. :-(

I think using Googles / Apples solution is the only option for now, since the development of an own "Google free" solution would cost time and money. Please read the Notification Exposure Documentation in this project for more information.

marcauberer avatar Jun 01 '20 21:06 marcauberer

@MalteJ That is great! Thank you for the reply. So I guess the next thing is to somehow be able to write a libre ENA API that is compatible to Privacy-Preserving Contact Tracing. The question was can it be pulled off and has someone experiment with it already?

I do not know any project that is trying to create a compatible API. But I know a few folks that would be very interested in such a project and maybe could also help. @theScrabi Are you an Android developer? Can you start developing such an API?

MalteJ avatar Jun 01 '20 22:06 MalteJ

I am an android developer, however I am not sure if I could write such an API as I have little experience with bluetooth and that API (yet). I started to read my self into the documentation and prestandard DP-3T and I think to see some parallels, so maybe by modifying prestandard DP-3T android sdk it might be possible to create a compatible API, but I don't know for certain right now. If I tried it I'd need help.

theScrabi avatar Jun 01 '20 22:06 theScrabi

@theScrabi you will find most of the documentation here: https://www.apple.com/covid19/contacttracing/

I think when you start developing a lib there will appear some helping hands from the community 🙂

Feel free to post a link to your project in this thread!

MalteJ avatar Jun 01 '20 22:06 MalteJ

So I've been researching a bit more, as I try to persue the idea of modifying the prestandart dp-3t android sdk. So here is a comparison of both based on what I found out so far (no assurance for accuracy, please correct me if I am wrong): DP-3T and PPCP are both decentralized and generate a new Token every ~15 minutes. In both protocolls that token derives from a general key that will be newly generated every 24h. On DP-3T that key derives from a $SK_0$ which will be generated only once when the app is installed. Then every 24h a new $SK_i$ is derived from it by applying a HMAC function on $SK_{i-1}$. At PPCP this equivalent is the Temporary exposure key $Tek_i$. It will be generated by a random function so no deriving here. For DP-3T $SK_i$ and a Constand string are then Hashed and put into a stream cipher function. From this stream every 15 minutes 16 bytes are taken and used as Ephermal ID which is the id that gets broadcasted via Bluetooth. The equivalent for the EPhid for PPCP is the Roling proximity identifier. First the Rolling Proximity identifier key $RPIK_i$ is derived from $Tek_i$ together with the string "EN-PRKI" using an HKDF function. About every 15 minutes a rolling proximity identifier $RPI_{i,j}$ is then derived from it by encrypting the $RPIK_i$ together with $PaddingData_j$ padding data contains several information including a counter that will represent the time that $RPI_{i,j}$ was generated. The $RPI_{i,j}$ is then concatenated together with the Associated Encrypted Metadata $AEM_{i,j}$ which contains encrypted information about the sending power and the protocol version. The concatinated data is then broadcasted via Bluetooth.

dp-3t_ppcp

The Broadcasting in both cases is send as Payload of the BLE Advertising packages. If someone was found to be Corona positive he/she must upload all $SK_i$ and Timestamps $i$ of the past 14 Days for DP-3T, and all $Tek_i$ for the past 14 Days for PPCP.

One thing that concerned me a little is that the App apparently can not see the received $RPI_{i,j}$. The app can only give the infected $Tek$s it received from the server to the framework, and ask how high the risk is that the user got infected. That the app has no access to the $RPI_{i,j}$ is ok, but here it says that an algorithm calculates the risc based on the exposure duration, however I could not yet find any information about this algorithm.

The next thing I will do is frok the prestandard DP-3T android sdk and try to get it working so we can start modifying it :)

theScrabi avatar Jun 02 '20 17:06 theScrabi

Here is the repository of the CoraLibre-android-sdk. So far it still contains the prestandard DP-3T android sdk.

theScrabi avatar Jun 02 '20 19:06 theScrabi

@MalteJ you said we can not use the name or the package name of "Corona-Warn-App" which is ok. We will not, but can we leave the paths of the package names of the java files? If not pulling changes from this repo will get harder as files are going to be stored in different locations.

theScrabi avatar Jun 04 '20 22:06 theScrabi

You just shouldn't use the name for any marketing stuff. I don't care about path names. :)

MalteJ avatar Jun 04 '20 23:06 MalteJ

We are making progress. The prototype of the encryption core for the Apple/Google is done. It needs to be verified by a security expert: https://github.com/theScrabi/CoraLibre-android-sdk/tree/master/coralibre-sdk/sdk/src/main/java/org/coralibre/android/sdk/internal/crypto/ppcp

@MalteJ According to this file, page 11, Apple/Google hand out test vectors if you kindly ask them. Do you know if these testvectors are public available or who I can kindly ask to get them?

theScrabi avatar Jun 05 '20 20:06 theScrabi

Great project and amazing work so far, thanks for the effort @theScrabi. One question: Does the work need to have location services on all the time to ensure the background scanning?

ra0e avatar Jun 06 '20 14:06 ra0e

Good question. I'd guess if the app uses a "priviliged extention" then no, if you install it on a phone without root access that probably yes.

theScrabi avatar Jun 06 '20 18:06 theScrabi

@theScrabi Unfortunately I cannot provide you any data that we got under NDA. But you can install the Apturi Covid app and receive their BLE beacons: https://www.apturicovid.lv/

MalteJ avatar Jun 06 '20 18:06 MalteJ

@theScrabi

but here it says that an algorithm calculates the risc based on the exposure duration, however I could not yet find any information about this algorithm.

The Apple documentation on ENExposureConfiguration explains that algorithm.

mh- avatar Jun 06 '20 20:06 mh-

@theScrabi

According to this file, page 11, Apple/Google hand out test vectors if you kindly ask them. Do you know if these testvectors are public available or who I can kindly ask to get them?

I don't have official test vectors either, but when I wanted to verify that this implementation is correct, I used a "live" Android device (Android 9, GMS 20.18.17) to generate TEKs, RPIs and AEM (and the Raspberry Pi setup to receive RPIs and AEM). Malte asked me to provide some of these values for reference:

TEK: 2765f41dbaa6306a264391913bf48723
Derived RPI Key: 34fbbd748ee78bb386d1bab2df7b4165
Derived AEM Key: e01962eba7d4b5f40226e91ef9f6c050
RPI: 31fc2cf219fe6dc6765d860e979ba9e5 (decrypted: 454e2d525049000000000000bb772800) --> interval: 0x002877bb == 2652091; * 600: 1591254600 == GMT: Thursday, June 4, 2020 7:10:00 AM
AEM: 10413a43 (AES-ECB-encrypt of RPI using the AEM Key: 50b33a43150b72c90c1d8756eb9f864b - 50b33a43 XOR 10413a43 == 40f20000 --> TX Power Level 0xf2 == -14dBm

Another valid combination is this

TEK: 7921b817fdb92074df5345594273756f
RPIK: eae8956644770f952871daf549c0ce7e
AEMK: a1d0bd6f94b053cf622ca88194e20611
padded data: 454e2d5250490000000000005b6d2800 --> RPI: 5811408cf8d88d2b33f73773a7c6d45f
metadata: 400c0000 --> AEM: a3e9517f

Hope this helps - any questions, let me know :)

mh- avatar Jun 06 '20 20:06 mh-

Great project and amazing work so far, thanks for the effort @theScrabi. One question: Does the work need to have location services on all the time to ensure the background scanning?

I found the answer. Seems like you have to enable location mode in order to get Android to locate the Bluetooth device. It is a privacy orientated feature. if its disabled, every Android device will send the same Bluetooth-ID 02:00:00:00:00:00

ra0e avatar Jun 06 '20 22:06 ra0e

Seems like you have to enable location mode in order to get Android to locate the Bluetooth device. It is a privacy orientated feature. if its disabled, every Android device will send the same Bluetooth-ID 02:00:00:00:00:00

You are right, it's a privacy oriented feature. What Google wants to prevent is this scenario: An app does not get permission from the user to access location, and then still tracks the user's location by BLE-scanning for fixed-position BLE beacons (iBeacons in stores, museums, etc). Totally different use case, but leads to the requirement to ask the user for location permission, even though the location isn't used within the app.

Regarding the BDADDR: I found that my Android device (MTK chipset) advertises random BDADDR like 4f:31:14:1c:d7:20, 59:7b:48:03:e7:c9, 60:a5:86:6f:9a:d3, 77:11:8d:93:f9:d4, ... Generally it starts with 4, 5, 6, or 7, i.e the most significant bits are 01. This is a resolvable private address, see e.g. Bluetooth Core Spec 4.2. Whereas iPhones advertise these beacons with a random BDADDR where the msbs are 00 (non-resolvable private address). BTW, if you advertise with an undefined random BDADDR (msbs 10, such as 80:...), Android devices will happily scan the beacon, but iPhones will ignore them, so be careful with that.

mh- avatar Jun 07 '20 05:06 mh-

Seems like you have to enable location mode in order to get Android to locate the Bluetooth device. It is a privacy orientated feature. if its disabled, every Android device will send the same Bluetooth-ID 02:00:00:00:00:00

You are right, it's a privacy oriented feature. What Google wants to prevent is this scenario: An app does not get permission from the user to access location, and then still tracks the user's location by BLE-scanning for fixed-position BLE beacons (iBeacons in stores, museums, etc). Totally different use case, but leads to the requirement to ask the user for location permission, even though the location isn't used within the app.

That's unfortunately also different to the "official" Google Play Services: There your app is NOT allowed to have location services.

This is one reason why the official CWA app can't implement both things at same time:

  • if you use the official Google/Apple API, you are NOT allowed to access Location - that's clearly stated. This is to prevent countries like France to track infected users.
  • if you implement your own BLE scanning (as proposed here), you have to access Location for the reasons above.

Both at same time wont work as you app would not be accepted for Play Store.

Actually the approach with Google Play Services is in my opion better for privacy! It sounds strange, but actually the app itsself can't track the location of users if you separate the BLE tracking from the COVID19 app (what Apple and Google do). The play services are a separate service implemented on a lower level in the operating system. Those have all required permissions to do the BLE scanning, but don't expose any privacy critical information to the COVID19 app. That's well thought!

uschindler avatar Jun 07 '20 08:06 uschindler

@uschindler While I think you are right regarding the standard approach "Corona-Warn-App on a GMS Android Device", this thread is about an approach where people compile everything from source. They could easily see what the software does with their privacy critical information.

mh- avatar Jun 07 '20 08:06 mh-

I think it may be good to consider possible consequences of publishing open implementations of EN, normally I'm on board with fully open-source solutions, but it seems that in this particular case (distributed approach) it will enable easy "nerd attack":

The DP3T app collects as little information as possible by design, but operates with an open protocol. This means that anyone can develop his own DP3T-like client, and possibly decide to collect more data than what DP3T meant. There could be “enriched apps” which collect more information for each encounter such as the geographic location, the exact time, more information about the Bluetooth message. The app would make its best effort to link changing EphIDi, which would not be too hard from Bluetooth metadata and signal strength if there are not many neighbors in proximity and they are all static. The app could further invite the user to enter more data such as if he knows the person, their gender, approximate age, visible ethnicity, etc, or in which circumstance this encounter occurred (e.g. in bus line x, in the elevator of building y). The enriched app could easily create a huge database. With this, an isolated malicious user could start identifying many reported cases.

It's already easy to build an app on e.g. Raspberry PI zero W which collects RPIs together with exact timestamp and GPS location, and then matches these RPIs with published TEKs (which can be continuously fetched with simple curl cron job since it seems that there will be no safeguards like Device Attestation). Then adversary will be able to determine exact time and location of contact with infected person, which in turn may result in easy identification of this person. This capability could be limited to developers who can implement this from scratch, but with open implementation I guess this attack vector may be much more popular.

Keep in mind that users of official app are told that:

Die App ist so konzipiert, dass so wenig personenbezogene Daten wie möglich verarbeitet werden. Das bedeutet zum Beispiel, dass die App keine Daten erfasst, die es dem RKI oder anderen Nutzern ermöglichen, auf Ihre Identität, Ihren Gesundheitsstatus oder Ihren Standort zu schließen.

and right before turning on broadcasting of RPIs further assured that:

Die verschlüsselten Zufallscodes geben nur Auskunft über das Datum, die Dauer und die anhand der Signalstärke berechnete Entfernung zu Ihren Mitmenschen. Persönliche Daten wie Name, Adresse oder Aufenthaltsort werden zu keiner Zeit erfasst.

This is true if only official apps will be in use, but once we have enriched apps (developed based on open EN implementation) which provide fine-grained information about the encounters then above statement no longer holds.

I'm a big supporter on fully open-source / libre software and I think a solution where users don't have to create Google account in order to participate in contact tracing would be much better, but unfortunately current approach does not ensure full privacy by design and open implementation may enable more adversaries to violate privacy of others.

For those who don't want to use Google products one possible solution is to get cheap used Android phone from eBay, create fake Google account and carry it in another pocket (e.g. Motorola G2 which is capable of running EN costs 25-30 EUR).

If you decide to go forward with publishing EN implementation just make sure that it does not violate any licenses from Google / Apple (e.g. Beacon Simulator app developer had to remove iBeacon format implementation from public source code, despite the fact that iBeacon specification is open)

kbobrowski avatar Jun 07 '20 14:06 kbobrowski

@kbobrowski One opposing view would be "Theoretically, it would be possible to follow a user, collect the user's RPIs, connect them with person-identifying data, and then check if the person is ever marked as infected. Yet, in practice, this attack vector to deanonymize a user requires a high amount of effort just to gain little additional information compared to the one already gathered while following the user." (quoted from here)

Also we should consider that the CoraLibre-android-sdk discussed above aims at replacing the GMS EN API, not the corona-warn-app. A "nerd attacker" could already today use either GMS EN, or other cheap hardware like you mentioned, and doesn't need to wait for this open/libre implementation of the API.

mh- avatar Jun 07 '20 15:06 mh-

@mh- I don't fully agree with this quote from documentation, adversary just need to get a single RPI to determine if person who broadcast this RPI was within infectious period during encounter, there is no need to follow this person with "high amount of effort", couple of seconds in the same room is enough.

You are of course right that already today "nerd hacker" could use custom implementation of EN, but as long as there is no popular open EN implementation this attacker would at least need to know how to implement it from scratch. This is not black or white situation, I'm sure some people would be doing these "nerd attacks" anyway, but with easily accessible open implementations of EN the situation may have a bit darker shade of grey.

I just wonder about risk-reward ratio of developing alternative EN implementation as open-source, it's fine if someone who can do it does it for own responsible use, it's simple enough that it does not require community effort to develop. Developing it in open will enable more Google-free users to participate in contact tracing effort but at the same time will increase number of adversaries who otherwise would not have enough coding skills to perform "nerd attacks", potentially undermining trust in this system. Difficult to judge net outcome.

kbobrowski avatar Jun 07 '20 16:06 kbobrowski

@kbobrowski "nerd attacker" could already today use the original Google EN implementation, once an official app is installed and activated, it will start collecting RPIs.

mh- avatar Jun 07 '20 16:06 mh-

@mh- I don't exactly see how attacker could use official app for "nerd attacks", maybe by turning Bluetooth only next to the person whose infectious status attacker wants to determine but it would be very error-prone and require "high amount of effort" due to longer time required to stay in close contact. With custom EN implementation it's much more precise and require much less effort. Or do you mean reverse-engineering like using Frida to hijack some functions which interact with Google's EN? Or disabling signature check such that custom app can interact with Google's EN? But then again it requires higher level of skill which limits potential attackers.

kbobrowski avatar Jun 07 '20 16:06 kbobrowski

Well, you are talking about nerds... Google makes efforts to access-control the RPIs that its EN implementation stores, not to encrypt them. Nerds will likely be able to override access control on their own device.

mh- avatar Jun 07 '20 16:06 mh-

OK guys, thanks for your thoughts regarding nerd attack and Open Source philosophy. But please let us focus again on the tech, here. If you like you can create an issue in the documents repo and discuss this topic there. Here in the Android repo all discussions should be tech-only :slightly_smiling_face:

MalteJ avatar Jun 07 '20 16:06 MalteJ