attribution-reporting-api
attribution-reporting-api copied to clipboard
‘Extra Report Delay’ for Aggregate API
I want to consider adding ‘extra report delay’ to aggregatable report’s shared info and the definition of shared ID for Aggregate API to partially address the impact of delay loss on aggregate API.
Context
Aggregatable Reports today are scheduled to be sent with a random delay between 10 min and 1 hour. However, due to a variety of circumstances, the reports may be further delayed. For example, the user was offline when the report was scheduled to be sent.
Currently, the definition of shared ID prevents the ad tech from processing a “delayed aggregatable report” if aggregatable reports, which have the same shared ID, were already processed. For example, assume an ad tech employs a batching and processing strategy of starting to process reports 2 hours after all were scheduled to arrive. In this case, when the ad tech tries to process a batch of reports that arrived with a longer delay (i.e. after processing has started), the aggregation service will reject it. See, for example: item#1 in https://github.com/WICG/attribution-reporting-api/issues/716
Proposal
With this change, the browser will include in the aggregatable report a new field, extra_report_delay
, which reflects how long the report was delayed in being delivered to the ad tech endpoint, beyond the intended random delay. In other words, it’s the difference between the delivery time and scheduled report time.
To minimize performance impact on the Aggregation Service, we expect to bucket the extra_report_delay
field, for example: no/little delay (<=2 hrs), some delay (2hr - 24hr), long delay (>24 hrs).
By expanding the definition of the shared ID, ad techs could generate summary reports using the aggregatable reports that arrive with little or no delay, and later process delayed reports. Deciding how to batch the reports with the extra_report_delay
field will be based on balancing utility and privacy. Two illustrative examples:
- Ad tech continues to batch and process reports regardless of the extra delay value. The summary reports will include the same level of noise as is today. But the ad tech may not be able to process delayed reports.
- Ad tech batches and processes reports by
extra_report_delay
separately. Assuming the first value of the field is no/little delay (<=2 hrs) -- ad tech can process all reports with theextra_report_delay
value of "no/little delay", as early as two hours after the scheduled report time, and generate a summary report (with noise drawn). Later, when the ad tech processes the longer delayed aggregatable reports, another summary report will be generated (with noise drawn again)
We are looking for the following feedback on this proposal, especially on:
- Despite the additional noise generated when delayed reports are processed, would it be more useful than the current state of not being able to process such delayed reports?
- To inform the definition of delay buckets -- what’s the typical processing strategy of ad techs?