attribution-reporting-api
attribution-reporting-api copied to clipboard
Specify the keys.json format
The aggregate explainer specifies that each helper origin should publish public keys for encrypting report payloads at /.well-known/aggregation-service/keys.json. The format of this should be specified.
Here's a proposed format:
[
{
"not_before": "<when key becomes valid, encoded as a string containing an integer timestamp
in milliseconds since the Unix epoch>",
"not_after": "<key expiry, similarly encoded>",
"keys": [
{
"id": "<arbitrary string (up to 128 chars) identifying the key, e.g. a UUID>",
"key": "<base64-encoded public key>",
},
// Optionally, additional keys
...
]
},
// Optionally, additional keysets
...
]
Each helper defines at least one keyset containing at least one key. Specifying future keysets avoids races and other timing issues near a keyset's expiry. (For example, having to refetch keys immediately after downloading.)
To limit the impact of a compromised public key, the browser can implement:
Key rotation
Each keyset should be rotated (at least) weekly. So, each keyset's validity period (i.e.not_before to not_after) should be no longer than 7 days. Additionally, keysets should not be specified too far in advance (say, each keyset's not_before should be no later than 14 days in the future) in case a future key is compromised. Keysets' validity periods should be non-overlapping (see below).
Key slicing Each helper server could make multiple public keys available for each keyset. At encryption time, the browser will (uniformly at random) pick one of the public keys to use. This selection should be made independently between reports so that the key choice cannot be used to partition reports into separate groups of users. As keysets' validity periods are non-overlapping, a client can be sure it is selecting from all the keys that are currently valid without refetching. To limit storage, the number of keys that can be specified should be limited (e.g. to 5).
We may need a mechanism to ensure different clients are supplied the same keysets.
This format could also be modified to support versioning the encryption protocol, but this may be unnecessary.
One simpler alternative is to just use HTTP caching to allow for key rotation:
The browser would parse keys of the format:
[{
id: "key_1",
key: "<base64 encoded key>"
},
{
id: "key_2",
key: "<base64 encoded key>"
},
...
]
The helper servers can use traditional HTTP caching logic (e.g. Cache-Control) to specify the max age of these keys.
The browser could consider enforcing minimum/maximum ages for public keys in order to provide some guarantees on key rotation and limit the effects of bugs/erroneous headers.
Following up from my comment on today's call, I wanted to call out that some timing information may leak from the encrypted reports. Here is an example attack with some assumptions (note these assumptions likely aren't accurate, just for the sake of the example.)
Suppose:
- Public keys and the validity period is known to all.
- The reports are encrypted with an asymmetric scheme, i.e.
E(pk, p) => candD(sk, c) => p.- The reports are also authenticated with an Encrypt-then-MAC scheme, i.e.
MAC(pk, c) => t. The browser then returns(c, t)in the encrypted report.
- The reports are also authenticated with an Encrypt-then-MAC scheme, i.e.
Attack:
c, t # cyphertext and tag of report
public_keys # dict of {pk: validity_window}
for pk in public_keys.keys():
if MAC(pk, c) == t:
print("validity window is {}".format(public_keys[pk]))
This is certainly avoidable, so just worth keeping track of. For example, if the keys are valid for much longer than the max delay for the reports, it's unlikely new information is revealed (though edge cases still exist, i.e. if the key was only valid for 1 minute when received, it reveals the encryption was run in the last minute.)
In addition to the attack that @eriktaubeneck describes, it is possible to use the choice of key (with or without explicit identifiers) to carry information. If a helper is colluding with sites, this results in providing a not-so-covert channel between the point at which the report is generated (an advertiser, say) and the helper. Without measures in place to ensure key consistency, you end up needing to trust all helpers rather than just any of them.