synthcity Question about k anonymity metric

Question about k anonymity metric

Open amad-person opened this issue 2 months ago • 0 comments

I had a question about interpreting synthcity's k anonymity metric for a synthetic dataset.

Consider the following example train dataset:

	Age	Gender	Zip Code	Medical Condition
1	25	F	10000	Condition X
...	...	...	...	...
n	30	M	20000	Condition Y

Here, the sensitive feature is Medical Condition.

Suppose a synthetic dataset has k = 1 because there is only one such row in it:

	Age	Gender	Zip Code	Medical Condition
1	25	F	10000	Condition Y

Here, the sensitive feature (Condition Y) in the synthetic dataset is not the true one in the train dataset (Condition X). So on observing the synthetic dataset, the adversary won't learn the true value for the sensitive feature. In this case, can we say that a low k value for the synthetic dataset doesn't necessarily imply it has lesser privacy?

Are there any recommended guidelines on interpreting synthcity's k anonymity metric?

Thank you!

Apr 26 '24 19:04 amad-person

synthcity synthcity copied to clipboard

Question about k anonymity metric

synthcity
synthcity copied to clipboard