benchbase icon indicating copy to clipboard operation
benchbase copied to clipboard

Feat: Add Skeleton to support Dataset anonymization with differential privacy

Open ETHenzlere opened this issue 1 year ago • 0 comments

DP-Anonymization for BenchBase

In order to measure the influence of modern anonymization techniques on datasets and their query performance, we would like to add support for differential privacy mechanisms to BenchBase. In this PR, we have added the skeleton of our approach to the codebase. We would like to introduce a new flag: --anonymize=true to the system that allows to anonymize tables after the loading step and before execution.

The anonymization will:

  1. Pull data from the DBMS via already specified JDBC connection (config)
  2. Run DP-algorithms on the data to create a new, synthetic dataset
  3. Push the synthetic data back to the DBMS as an anonymized copy

The anonymization information must be provided in the config file. The process will work with minimal information but also allow for fine-tuning. A separate README file has been constructed that will list all the features and how to use them. /scripts/anonymization/README.md

Minimal config:

 <anonymization>
        <table name="item">
          <differential_privacy />
        </table>
   </anonymization>

Sensitive value handling is one feature we want to add to the process immediately. It replaces actual values of specified columns with fake ones. The code base has already been written, tested and used privately within BenchBase.

The column faking approach will be decoupled from differential privacy, to allow for more control.

 <anonymization>
        <table name="item">
          <differential_privacy> ... </differential_privacy>
          <value_faking> ... </value_faking>
        </table>
   </anonymization>

Disclaimer: The anonymization itself is not part of this PR in order to reduce the complexity. Currently, the anonymization flag will call the script and parse the config. The rest of the code is ready-to-be-added

Architecture Benchbase.pdf

ETHenzlere avatar Jan 24 '24 09:01 ETHenzlere