opacus
opacus copied to clipboard
Privacy Leakage at low sample size
🐛 Bug
When using opacus at low sample sizes (~2-3 samples), I managed to leak more privacy than the accounting described:
Link: https://colab.research.google.com/drive/1gZVrg9kPIWjibApBkEnKNQqaIn8kUySs?usp=sharing
The privacy estimation is made as in: https://proceedings.neurips.cc/paper/2020/file/fc4ddc15f9f4b4b06ef7844d6bb53abf-Paper.pdf
The idea is as follows:
- Craft some worst-case D and D'
- Run your mechanism (a linear regression trained via DP-SGD in this case) on one of your datasets by flipping a coin b
- The adversary outputs a score (in this case, it's a dimension that behaves pretty worst-case)
- Select the best threshold the adversary could pick to correctly guess the values of b
- Estimate the privacy as in the paper above.
When the cardinality is high, privacy is tight, but at a low sample size, privacy guarantees get violated (check last print, the reported epsilon by opacus is ~1.2, while I leak ~2.5).
I shared this with @alexandresablayrolles and he mentioned that the problem might be that, in this scenario, opacus is leaking the cardinality of the underlying dataset/batch (which is private).
I am curious to hear further thoughts/feedback and help patch this.