PipelineDP
PipelineDP copied to clipboard
PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
## Description Please include a summary of the change, the motivation, and any additional context that will help others understand your PR. If it closes one or more open issues,...
## Description This PR continues the work started for #264 . An additional `PrivatePTransform` is added for vector summation in [private_beam.py](https://github.com/OpenMined/PipelineDP/blob/bb4046265e155f45cdd7b4ee91b8d595c1089c72/pipeline_dp/private_beam.py#L96). ## Affected Dependencies * VectorSumParams class was added to...
# Context ## Definitions _Contribution bounding_ is a process of limiting contributions by a single individual (or an entity represented by a privacy key) to the output dataset or its...
# Context The workflow for computing DP aggregations with PipelineDP is the following (not important here steps are missing, [the full example](https://github.com/OpenMined/PipelineDP/blob/41b70a3c7e19b82024e2d0f44842aaab570440bd/examples/quickstart.ipynb)): ``` # Define the total budget. budget_accountant =...
# Context [DPEngine.aggregate](https://github.com/OpenMined/PipelineDP/blob/66012f04a94720ba2e5499c1c96edb3399b83287/pipeline_dp/dp_engine.py#L53) is an API function that performs DP aggregations. It takes [AggregateParams](https://github.com/OpenMined/PipelineDP/blob/66012f04a94720ba2e5499c1c96edb3399b83287/pipeline_dp/aggregate_params.py#L57) as an argument. `AggregateParams` is dataclass and specifies details of the computation which is performed. It...
# Context PipleineDP supports anonymzation with Beam RDD API ([example](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_on_spark.py)). It seems interesting to have the support of [Beam SQL API](https://beam.apache.org/documentation/dsls/sql/overview/). # Goal To investigate and to design BeamSQL API...
# Context PipleineDP supports anonymzation with Spark RDD API ([example](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_on_spark.py)). It seems interesting to have the support of [Spark SQL API](https://spark.apache.org/sql/). # Goal To investigate and to design SparkSQL API...