attribution-reporting-api
attribution-reporting-api copied to clipboard
[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting
Hello,
There are some cases where we want to use the aggregated API for two uses cases which are quite different:
- A low latency campaign monitoring system, where knowing with little delay attributed sales is paramount for correct delivery. Little to no delays can be
- Detailed client reporting where precision and richness of the data presented are key. Here delays are more acceptable (ie some arbitrage can be made between delay and signal-to-noise ratio).
We struggle to articulate the two use cases within the API in its current form. Because the data can be only processed once, we have to sacrifice one of the use cases (ie either use one detailed encoding and process the data hourly, meaning use case 2. gets drowned by noise, or process data daily and sacrifice use case 1.).
Supporting the two use cases at the same time could be done by allowing several passes of the data in the aggregation service. To keep the differential privacy properties of the aggregation service, we could keep track of the already consumed budget (i.e. the first pass used ε/4, the second ε/2, and the last ε/4). Another approach would be to define broad key spaces (e.g. split the 128 bit space in 4 buckets), and allow the aggregation only once per key space. This way one would encode in the first key space the fast-paced campaign monitoring metrics and query the aggregation service hourly for them, and encode the client reporting metrics elsewhere and aggregate them weekly.
Both methods have their pros and cons, the latter being more precise (as one doesn't burn some of my budget for the two use cases at the same time), and the former enabling to have less regret (ie on can always reserve some budget for a last aggregation in case of a mistake).
For both methods, the storage space of the aggregation service. can be controlled by setting a sensible but low limit on the number of times the data can be processed.