K-Anonymity
K-Anonymity copied to clipboard
Tried converting this python code to pyspark
I tried converting this python code to pyspark code. I am running the same dataset with pyspark code in AWS EMR cluster. For 200 records it was taking 9 minutes of time. For the 30,000 records it was taking 22.5 hours of time. Is there any way to optimise the code? Please help me. Thanks in Advance.