Mondrian
Mondrian copied to clipboard
Final low and high values of the partitions
So, I notice that in the code for the Mondrian, we only update the parent low and high values along a dimension when it is chosen as an allowed dimension. A few concerns regarding that:
- The dimension choice depends on the low and high values so won't we use wrong and old values to make the choice
- Once a dimension cannot be split anymore, we do not update its low and high value, however, a split in some other allowable dimension may cause range for this dimension to change as well
So, I notice that in the code for the Mondrian, we only update the parent low and high values along a dimension when it is chosen as an allowed dimension. A few concerns regarding that:
- The dimension choice depends on the low and high values so won't we use wrong and old values to make the choice
- Once a dimension ca'snot be split anymore, we do not update its low and high value, however, a split in some other allowable dimension may cause range for this dimension to change as well
Hi @prajwal1210
Sorry for late reply. :)
A to your concerns:
- The basic guideline of generalization is using range values to replace real values, such that the results are correct but not wrong. This technique is not perfect. It doesn't work for all cases.
- Correct. Splitting on other dimension may change the range of other dimension, but that won't hurt data anonymization.
Have a nice day! Qiyuan
Hello, I just wanted to ask what data exactly gets anonymized. I am running the code with the instructions and i cant quite understand what goes inside the anonymized.data. I am sorry if this sounds like a "stupid" question but I am new to this.
Thank you!
Hello, I just wanted to ask what data exactly gets anonymized. I am running the code with the instructions and i cant quite understand what goes inside the anonymized.data. I am sorry if this sounds like a "stupid" question but I am new to this.
Thank you!
Hi @3ndri . There isn't any stupid question, only stupid answer.
In short, we all know identifier (such as phone number) should be removed, meanwhile QIDs (quasi-identifier, such as age, gender etc) will be anonymized by k-anonymity related algorithms (e.g., Mondrian or others), all others attributes including sensitive values will remain untouched.
Hope this information can help you. :)
But which column is the phone number in adult.data?
Also the output is the same whether i run it with k=10 or k=20
But which column is the phone number in adult.data?
IDs (phone personal ID or others) are already removed before available.
Also the output is the same whether i run it with k=10 or k=20
No. They are different in NCP, which means information loss (higher NPC means more loss). Pls read REAMD.md, and checkout the output dir.
But what does the output over K=10 mean? The one which reads:
[[], ['State-gov', 'Self-emp-not-inc', 'Private', 'Federal-gov', 'Local-gov', 'Self-emp-inc', 'Without-pay'], [], ['Never-married', 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', 'Separated', 'Married-AF-spouse', 'Widowed'], ['Adm-clerical', 'Exec-managerial', 'Handlers-cleaners', 'Prof-specialty', 'Other-service', 'Sales', 'Transport-moving', 'Farming-fishing', 'Machine-op-inspct', 'Tech-support', 'Craft-repair', 'Protective-serv', 'Armed-Forces', 'Priv-house-serv'], ['White', 'Black', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other'], ['Male', 'Female'], ['United-States', 'Cuba', 'Jamaica', 'India', 'Mexico', 'Puerto-Rico', 'Honduras', 'England', 'Canada', 'Germany', 'Iran', 'Philippines', 'Poland', 'Columbia', 'Cambodia', 'Thailand', 'Ecuador', 'Laos', 'Taiwan', 'Haiti', 'Portugal', 'Dominican-Republic', 'El-Salvador', 'France', 'Guatemala', 'Italy', 'China', 'South', 'Japan', 'Yugoslavia', 'Peru', 'Outlying-US(Guam-USVI-etc)', 'Scotland', 'Trinadad&Tobago', 'Greece', 'Nicaragua', 'Vietnam', 'Hong', 'Ireland', 'Hungary', 'Holand-Netherlands']]
Oh I get it now, those are the quasi-identifiers
I have a question about which database this program calls
Can you help me annotate the program? I don't understand it as a novice please
Can you help me annotate the program? I don't understand it as a novice please
Hi @Arigato97
This program calls adult dataset (https://github.com/qiyuangong/Mondrian/blob/master/data/adult.data) by default, and can be changed into infoms dataset (https://github.com/qiyuangong/Mondrian/blob/master/data/conditions.csv and https://github.com/qiyuangong/Mondrian/blob/master/data/demographics.csv)
Can you add a little more comments to the program? It seems a little difficult for me ,please,help
有些程序看不明白 不清楚具体作用 能添加多一些注释吗 谢谢
有些程序看不明白 不清楚具体作用 能添加多一些注释吗 谢谢
抱歉,已经不会再添加注释和功能。