SDMetrics icon indicating copy to clipboard operation
SDMetrics copied to clipboard

Investigate performance limits of `DisclosureProtection` metric

Open frances-h opened this issue 1 year ago • 0 comments

Problem Description

Currently, the DisclosureProtection metric warns about poor performance when the size of the input data is greater than 50,000 rows. This number was chosen without investigation into the performance of the metric. It'd be helpful to know how the performance of the metric changes based on the size of the input, so that we can warn the user of possible poor performance earlier and suggest an alternative metric.

Expected behavior

Investigate the performance of the DisclosureProtection metric, considering input data length, number of known/sensitive columns, and number of unique discrete values in those columns. Also test across the different CAP methods.

Once we have a good understanding of the performance, we should update the warning in DisclosureProtection based on the results of the investigation.

frances-h avatar Dec 10 '24 15:12 frances-h