Introduce per partition contribution bounding for sum
Context
Note: Terminology might be found here.
In order to protect data of each user, it's required to make contribution bounding of each user data. There are 2 types of contribution bounding
Cross-partition contribution bounding is a procedure which ensures that each individual contributes to a limited number of partitions.
Per-partition contribution bounding is a procedure which ensures that each individual’s contribution to any single partition is bounded.
In this task let's consider only per-partition contribution bounding for sum aggregation. There are 2 options, consider an example how it works. Let privacy_id contributes [1,2,3,4,5] per partition.
- Bounding each contribution (this currently implemented): let
min_value= 1,max_value= 3,max_contributions_per_partition= 3 (those values are set in AggregateParams).
Then the bounding algorithm is the following: - randomly sample 3 elements, let the result [1,3, 5] - clip values to [1, 3]: [1,3, 3] - sum: 1 + 3 + 3 = 7 - anonymize with DP: 7 + random noise
- Bounding aggregated contributions (to implement): let
min_sum_per_partition= 0,max_sum_per_partition= 10 (those should be added toAggregateParams) Then the bounding algorithm is the following:- sum: 15 = 1+2+3+4+5
- clip sum to [0, 10]: 10
- anonymize with DP: 10 + random noise
Goal
To implement bounding aggregated contributions for sum.
This task might be split on the following parts:
- Add
min_contributions_per_partitionandmax_contributions_per_partitiontoAggregateParams. - Implement bounding aggregated contributions in SumCombiner (
SumCombinerperforms computation of DP sum ) - Implement
group_by_keyfor this type of contribution bounding instead of samplingsample_fixed_per_keyhere.
Note: there is draft PR (3 files in pipeline_dp/*) of this feature which implements 1 and 2, but w/o testing, it might be a good starting point.
I'm Ivy and I'm currently working on this issue.
Thank you!
This was implemented on PR