K-Anonymity Tried converting this python code to pyspark

Tried converting this python code to pyspark

Open ayyappa428 opened this issue 3 years ago • 0 comments

I tried converting this python code to pyspark code. I am running the same dataset with pyspark code in AWS EMR cluster. For 200 records it was taking 9 minutes of time. For the 30,000 records it was taking 22.5 hours of time. Is there any way to optimise the code? Please help me. Thanks in Advance.

Jul 29 '21 16:07 ayyappa428

K-Anonymity K-Anonymity copied to clipboard

Tried converting this python code to pyspark

K-Anonymity
K-Anonymity copied to clipboard