arx icon indicating copy to clipboard operation
arx copied to clipboard

[BUG] Running ProfitabilityProsecutor with an already suppressed dataset (GUI and API)

Open mhalilovic opened this issue 1 year ago • 5 comments

I encounter an error when anonymizing a fully suppressed dataset using the API, with similar behavior observed in the GUI.

Example to reproduce using the ARX GUI: Import a fully suppressed dataset (all * values), applying generalization hierarchies with just one level * Configured Profitability Prosecutor with suppression limit of 100%.

When attempting to anonymize, I get the message: Cannot anonymize data: Value (NaN) out of range [0,1]

Description of the API behavior:
The same issue appears to occur when using the API with Java. Here is part of my logs: Caused by: java.lang.IllegalStateException: Value (NaN) out of range [0,1] at org.deidentifier.arx.metric.v2.MetricSDNMEntropyBasedInformationLoss.getEntropyBasedInformationLoss(MetricSDNMEntropyBasedInformationLoss.java:109) at org.deidentifier.arx.criteria.ProfitabilityProsecutor.isAnonymous(ProfitabilityProsecutor.java:121) at org.deidentifier.arx.framework.check.groupify.HashGroupify.isPrivacyModelFulfilled(HashGroupify.java:758) at org.deidentifier.arx.framework.check.groupify.HashGroupify.analyzeWithEarlyAbort(HashGroupify.java:653) at org.deidentifier.arx.framework.check.groupify.HashGroupify.stateAnalyze(HashGroupify.java:447) at org.deidentifier.arx.framework.check.TransformationChecker.check(TransformationChecker.java:217) at org.deidentifier.arx.framework.check.TransformationChecker.check(TransformationChecker.java:170) at org.deidentifier.arx.algorithm.FLASHAlgorithmImpl.traverse(FLASHAlgorithmImpl.java:128) at org.deidentifier.arx.ARXAnonymizer.anonymize(ARXAnonymizer.java:777) at org.deidentifier.arx.ARXAnonymizer.anonymize(ARXAnonymizer.java:226) at org.deidentifier.arx.distributed.ARXWorkerLocal$1.call(Unknown Source) at org.deidentifier.arx.distributed.ARXWorkerLocal$1.call(Unknown Source)

mhalilovic avatar Feb 09 '24 13:02 mhalilovic

This should be relatively easy to fix. Can you please investigate the semantics of the number [0, 1] usually returned from getEntropyBasedInformationLoss? Is it 0 for no information loss and 1 for maximum information loss, or the other way around (0 for maximum information loss and 1 for no information loss)? Please let me know here.

prasser avatar Feb 09 '24 14:02 prasser

0 for no information loss and 1 for maximum information loss

mhalilovic avatar Feb 09 '24 15:02 mhalilovic

Please check whether the recent commit 984f38f fixes the problem.

prasser avatar Feb 09 '24 16:02 prasser

My issue with the API is resolved. Thank you!

The GUI also "anonymizes" the dataset now without a message. Most quality models have NaN or N/A values in the Quality models tab now. I do not know if this is expected behavior.

mhalilovic avatar Feb 09 '24 18:02 mhalilovic

Most quality models have NaN or N/A values in the Quality models tab now. I do not know if this is expected behavior.

Are you sure that this is caused by this commit? Please check.

prasser avatar Feb 09 '24 21:02 prasser