SACRO-ML
SACRO-ML copied to clipboard
Likelihood attack reproducibility
How to make the data splits of the LIRA reproducible? I noticed in the code it used random to generate indices to select rows of data, see line 303
these_idx = np.random.choice(indices, n_train_rows, replace=False)
Would be saving "these_idx" enough to count as reproducible? Or another way of solving this problem would be, for example, by saving the probabilities calculated by the shadow models (causing a potential issue for disk storage).
Also I noted a comment on lines 324 to 326 saying that some classes might not be represented in the split. Can something be done to avoid it as much as possible?
If we make the data splits reproducible, does LIRA still make sense?