spark-privacy-preserver icon indicating copy to clipboard operation
spark-privacy-preserver copied to clipboard

How to run it?

Open anki-code opened this issue 1 year ago • 0 comments

hi @ThaminduR! Thank you for your work here!

I'm trying to repeat the examples using jupyter/pyspark-notebook:spark-2 docker container with PySpark 2.4.5 and Python 3.7.6 (as required in the readme) but have no success. I tried many things to run it but I got errors again and again.

Is there a way to have step by step guide or docker container for test the code?

What I did:

# Run container
docker run --rm -it --entrypoint /bin/bash jupyter/pyspark-notebook:spark-2

apt update && apt install -y git vim
pip install -U pip

# Install dependencies manually
pip install -U pandas>=1.1 pyarrow diffprivlib==0.2.1 tabulate==0.8.7 mypy>=0.770 kmodes

# Install `spark-privacy-preserver`
git clone https://github.com/ThaminduR/spark-privacy-preserver
cd spark-privacy-preserver
pip install --no-deps .
pyspark
# Run the code from mondrian_preserver demo.ipynb

The line:

dfn = Preserver.k_anonymize(df, k, feature_columns, sensitive_column, categorical, schema)
dfn.show()

Output:

ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 13) java.lang.IllegalArgumentException

Could you please help with environment setup and runnning? Thanks!

anki-code avatar Nov 13 '23 16:11 anki-code