PipelineDP icon indicating copy to clipboard operation
PipelineDP copied to clipboard

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.

Results 39 PipelineDP issues
Sort by recently updated
recently updated
newest added

## Description Adds min_sum_per_partition and max_sum_per_partition. These two parameters only works for sum. Only 1 pair of (min_value, max_value) and (min_sum_per_partition, max_sum_per_partition) might set if (min_sum_per_partition, max_sum_per_partition) is set Metrics...

## Description This PR extends and complements the basic tutorial on **Apache Beam** by integrating very thorough explanations and references to the preparatory execution framework for the exercises, as well...

# Context On the [issue](https://github.com/OpenMined/PipelineDP/issues/10) a framework for Explain calculation reports ([code](https://github.com/OpenMined/PipelineDP/blob/main/pipeline_dp/report_generator.py)) was implemented. Currently, when the most computations has been implemented, we can add information about these computation in...

Type: New Feature :heavy_plus_sign:

# Short DP references **Definition:** Basic (or naive) composition of differential privacy mechanisms with parameters `(eps1, delta1)` and `(eps2, delta2)` is a mechanism with parameter `(eps1+eps2, delta1+delta2)`. There are many...

Type: New Feature :heavy_plus_sign:

# Context [PyDP](https://github.com/OpenMined/PyDP) library wraps [Google C++ building block library](https://github.com/google/differential-privacy/tree/main/cc/algorithms). Algorithms are inherited from [MetaAlgorithm](https://github.com/OpenMined/PyDP/blob/a88ee73053aa2bdc1be327a77109dd5907ab41d6/src/pydp/algorithms/_algorithm.py#L10) class. For now the most interesting for PipelineDP is [Pecentile](https://github.com/OpenMined/PyDP/blob/a88ee73053aa2bdc1be327a77109dd5907ab41d6/src/pydp/algorithms/laplacian/_percentile.py#L8) algorithm, but in future support...

Type: New Feature :heavy_plus_sign:

# Context Now **PipelineDP** supports 3 execution modes - with Apache Spark, Apache Beam, w/o frameworks ([here](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_all_frameworks.py) is an example how to run on different frameworks). Basically the current API...

Type: Epic :call_me_hand:

## Description Currently, PipelineDP supports multiple systems like apache beam, apache-spark, etc. While these systems would be used individually, they are definitely not going to be used in a combined...

Type: Improvement :chart_with_upwards_trend:

## Feature Description DP is valuable in many "traditional" machine learning pipelines, and sklearn is the largest "traditional" ML ecosystem in Python. Would examples or first-class support for scikit-learn `Pipeline`...

Type: Epic :call_me_hand:

Support dockerfile and docker image for testing. @chinmayshah99

Type: New Feature :heavy_plus_sign:

## Question Is there support of the 2.X.X versions of Apache Spark? ## Further Information I see in pyproject.toml pyspark 3.2.0 dependency. But in real enerprise and on-premise clusters typically...

Type: Question :grey_question: