opacus Privacy Leakage at low sample size

Privacy Leakage at low sample size

Open tudorcebere opened this issue 1 year ago • 6 comments

🐛 Bug

When using opacus at low sample sizes (~2-3 samples), I managed to leak more privacy than the accounting described:

Link: https://colab.research.google.com/drive/1gZVrg9kPIWjibApBkEnKNQqaIn8kUySs?usp=sharing

The privacy estimation is made as in: https://proceedings.neurips.cc/paper/2020/file/fc4ddc15f9f4b4b06ef7844d6bb53abf-Paper.pdf

The idea is as follows:

Craft some worst-case D and D'
Run your mechanism (a linear regression trained via DP-SGD in this case) on one of your datasets by flipping a coin b
The adversary outputs a score (in this case, it's a dimension that behaves pretty worst-case)
Select the best threshold the adversary could pick to correctly guess the values of b
Estimate the privacy as in the paper above.

When the cardinality is high, privacy is tight, but at a low sample size, privacy guarantees get violated (check last print, the reported epsilon by opacus is ~1.2, while I leak ~2.5).

I shared this with @alexandresablayrolles and he mentioned that the problem might be that, in this scenario, opacus is leaking the cardinality of the underlying dataset/batch (which is private).

I am curious to hear further thoughts/feedback and help patch this.

Mar 03 '23 21:03 tudorcebere

opacus opacus copied to clipboard

Privacy Leakage at low sample size

🐛 Bug

opacus
opacus copied to clipboard