PipelineDP
PipelineDP copied to clipboard
PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
## Description Adds min_sum_per_partition and max_sum_per_partition. These two parameters only works for sum. Only 1 pair of (min_value, max_value) and (min_sum_per_partition, max_sum_per_partition) might set if (min_sum_per_partition, max_sum_per_partition) is set Metrics...
## Description This PR extends and complements the basic tutorial on **Apache Beam** by integrating very thorough explanations and references to the preparatory execution framework for the exercises, as well...
# Context On the [issue](https://github.com/OpenMined/PipelineDP/issues/10) a framework for Explain calculation reports ([code](https://github.com/OpenMined/PipelineDP/blob/main/pipeline_dp/report_generator.py)) was implemented. Currently, when the most computations has been implemented, we can add information about these computation in...
# Short DP references **Definition:** Basic (or naive) composition of differential privacy mechanisms with parameters `(eps1, delta1)` and `(eps2, delta2)` is a mechanism with parameter `(eps1+eps2, delta1+delta2)`. There are many...
# Context [PyDP](https://github.com/OpenMined/PyDP) library wraps [Google C++ building block library](https://github.com/google/differential-privacy/tree/main/cc/algorithms). Algorithms are inherited from [MetaAlgorithm](https://github.com/OpenMined/PyDP/blob/a88ee73053aa2bdc1be327a77109dd5907ab41d6/src/pydp/algorithms/_algorithm.py#L10) class. For now the most interesting for PipelineDP is [Pecentile](https://github.com/OpenMined/PyDP/blob/a88ee73053aa2bdc1be327a77109dd5907ab41d6/src/pydp/algorithms/laplacian/_percentile.py#L8) algorithm, but in future support...
# Context Now **PipelineDP** supports 3 execution modes - with Apache Spark, Apache Beam, w/o frameworks ([here](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_all_frameworks.py) is an example how to run on different frameworks). Basically the current API...
## Description Currently, PipelineDP supports multiple systems like apache beam, apache-spark, etc. While these systems would be used individually, they are definitely not going to be used in a combined...
## Feature Description DP is valuable in many "traditional" machine learning pipelines, and sklearn is the largest "traditional" ML ecosystem in Python. Would examples or first-class support for scikit-learn `Pipeline`...
Support dockerfile and docker image for testing. @chinmayshah99
## Question Is there support of the 2.X.X versions of Apache Spark? ## Further Information I see in pyproject.toml pyspark 3.2.0 dependency. But in real enerprise and on-premise clusters typically...