Clustering_based_K_Anon icon indicating copy to clipboard operation
Clustering_based_K_Anon copied to clipboard

ran Anonymizer.py file but no output

Open aryansoni1108 opened this issue 7 years ago • 9 comments

I just ran the Anonymizer.py file but it seems to get stuck in processing i think. Iam pretty new to these type of projects so please help me. Adult data ['c:/Users/Aryan Soni/Downloads/Clustering_based_K_Anon-master/anonymizer.py'] K=10 Begin to K-Member Cluster based on NCP getting this output but after this no output is shown and cmd is basically stuck after this output. Please help me

aryansoni1108 avatar Feb 11 '19 08:02 aryansoni1108

Hi @aryansoni1108 It is not stuck! You didn't get output because this clustering based algorithm is too slow (single core single thread). It requires nearly 3 hours on my laptop (2017 macbook pro 15 inch). You can achieve better performance with optimized clustering algorithm. Or, you can get result in shorter time with less data (1000 records of adult data) or larger k (20 or 50).

Adult data
['anonymizer.py']
K=10
Begin to K-Member Cluster based on NCP
NCP 11.20%
Running time 10744.34seconds

qiyuangong avatar Feb 13 '19 03:02 qiyuangong

Hi, question which adult and informs datasets is actually used by the anonymiser.py? I want to try to cut down the processing time.

I've been running the algorithm (on informs) for past 4 hours with the data as is (from the gitHub downloads) and it still hasn't finished :(

Also, where can I find the optimized clustering algorithm on gitHub?

Thank you!

dataExperimenter2019 avatar May 01 '19 14:05 dataExperimenter2019

Hi, the datasets are placed in data dir. The adult.data is for adult dataset, while conditions.csv and demographics.csv are for Informs dataset.

About optimized clustering algorithm, I think you can start from optimized k-means clustering. Search these keywords with search engine, such as Google.

qiyuangong avatar May 02 '19 08:05 qiyuangong

Hi @qiyuangong , This is a great initiative. Appreciatable. ddddd While compiling the anonymizer.py this problem is occurring. Can you please help regarding this.?

FarihaHossain avatar May 06 '19 05:05 FarihaHossain

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange.

Can you give me the detailed running command?

qiyuangong avatar May 07 '19 05:05 qiyuangong

Hi, the algorithm works fine with adult data and it produced the result in 3 hours, but its running for a day and haven't produced any output for the INFORM dataset for k = 20. [python2 anonymizer.py i kmember 20] The above was the code i used in the terminal and its stuck on Begin to k-member cluster based on NCP from past 20 hours. Can you suggest an update or anything i can do to produce a result.

shivjais13 avatar May 07 '19 14:05 shivjais13

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange.

Can you give me the detailed running command?

hi, thanks for the reply I just cloned this repository and run it. Nothing changed and this problem came out.

FarihaHossain avatar May 07 '19 21:05 FarihaHossain

Hi @FarihaHossain I think this error is caused by IS_CAT mismatch with ATT_TREES. It seems that in a given attribute, it should be categoric attribute, but it is actually NumRange. Can you give me the detailed running command?

hi, thanks for the reply I just cloned this repository and run it. Nothing changed and this problem came out.

Hi, I run it on my env. Things go on well exception an saving error related to INFORM dataset.

Can you give me more details about your environment and running commend ?

qiyuangong avatar May 09 '19 14:05 qiyuangong

Hi, the algorithm works fine with adult data and it produced the result in 3 hours, but its running for a day and haven't produced any output for the INFORM dataset for k = 20. [python2 anonymizer.py i kmember 20] The above was the code i used in the terminal and its stuck on Begin to k-member cluster based on NCP from past 20 hours. Can you suggest an update or anything i can do to produce a result.

Well, I will add some output (maybe a progress bar) about that.

qiyuangong avatar May 09 '19 14:05 qiyuangong