Context

Note: Terminology might be found here.

In order to protect data of each user, it's required to make contribution bounding of each user data. There are 2 types of contribution bounding

Cross-partition contribution bounding is a procedure which ensures that each individual contributes to a limited number of partitions.

Per-partition contribution bounding is a procedure which ensures that each individual’s contribution to any single partition is bounded.

In this task let's consider only per-partition contribution bounding for sum aggregation. There are 2 options, consider an example how it works. Let privacy_id contributes [1,2,3,4,5] per partition.

Bounding each contribution (this currently implemented): let min_value = 1, max_value = 3, max_contributions_per_partition = 3 (those values are set in AggregateParams).

Then the bounding algorithm is the following: - randomly sample 3 elements, let the result [1,3, 5] - clip values to [1, 3]: [1,3, 3] - sum: 1 + 3 + 3 = 7 - anonymize with DP: 7 + random noise

Bounding aggregated contributions (to implement): let min_sum_per_partition = 0, max_sum_per_partition = 10 (those should be added to AggregateParams) Then the bounding algorithm is the following:
- sum: 15 = 1+2+3+4+5
- clip sum to [0, 10]: 10
- anonymize with DP: 10 + random noise

Goal

To implement bounding aggregated contributions for sum.

This task might be split on the following parts:

Add min_contributions_per_partition and max_contributions_per_partition to AggregateParams.
Implement bounding aggregated contributions in SumCombiner (SumCombiner performs computation of DP sum )
Implement group_by_key for this type of contribution bounding instead of sampling sample_fixed_per_key here.

Note: there is draft PR (3 files in pipeline_dp/*) of this feature which implements 1 and 2, but w/o testing, it might be a good starting point.

May 11 '22 17:05 dvadym

I'm Ivy and I'm currently working on this issue.

May 18 '22 07:05 sunchengxuanivy

Thank you!

May 18 '22 07:05 dvadym

This was implemented on PR

Jun 23 '23 12:06 dvadym

Introduce per partition contribution bounding for sum

Context

Goal