PipelineDP icon indicating copy to clipboard operation
PipelineDP copied to clipboard

Introduce per partition contribution bounding for sum

Open dvadym opened this issue 3 years ago • 2 comments

Context

Note: Terminology might be found here.

In order to protect data of each user, it's required to make contribution bounding of each user data. There are 2 types of contribution bounding

Cross-partition contribution bounding is a procedure which ensures that each individual contributes to a limited number of partitions.

Per-partition contribution bounding is a procedure which ensures that each individual’s contribution to any single partition is bounded.

In this task let's consider only per-partition contribution bounding for sum aggregation. There are 2 options, consider an example how it works. Let privacy_id contributes [1,2,3,4,5] per partition.

  1. Bounding each contribution (this currently implemented): let min_value = 1, max_value = 3, max_contributions_per_partition = 3 (those values are set in AggregateParams).

Then the bounding algorithm is the following: - randomly sample 3 elements, let the result [1,3, 5] - clip values to [1, 3]: [1,3, 3] - sum: 1 + 3 + 3 = 7 - anonymize with DP: 7 + random noise

  1. Bounding aggregated contributions (to implement): let min_sum_per_partition = 0, max_sum_per_partition = 10 (those should be added to AggregateParams) Then the bounding algorithm is the following:
    • sum: 15 = 1+2+3+4+5
    • clip sum to [0, 10]: 10
    • anonymize with DP: 10 + random noise

Goal

To implement bounding aggregated contributions for sum.

This task might be split on the following parts:

  1. Add min_contributions_per_partition and max_contributions_per_partition to AggregateParams.
  2. Implement bounding aggregated contributions in SumCombiner (SumCombiner performs computation of DP sum )
  3. Implement group_by_key for this type of contribution bounding instead of sampling sample_fixed_per_key here.

Note: there is draft PR (3 files in pipeline_dp/*) of this feature which implements 1 and 2, but w/o testing, it might be a good starting point.

dvadym avatar May 11 '22 17:05 dvadym

I'm Ivy and I'm currently working on this issue.

sunchengxuanivy avatar May 18 '22 07:05 sunchengxuanivy

Thank you!

dvadym avatar May 18 '22 07:05 dvadym

This was implemented on PR

dvadym avatar Jun 23 '23 12:06 dvadym