differential-privacy
differential-privacy copied to clipboard
Clearly distinguish private data and privitized data by using different variable naming conventions on each side of the noise barrier.
In reviewing the source code, it is not clear which variables refer to private data and which refer to privitized data. I suggest that you clearly distinguish these in the source code, perhaps by using different variable naming conventions.
Can you be a little more specific? Which variables are you talking about? Can you provide an example?
Sure. Consider https://github.com/google/differential-privacy/blob/master/differential_privacy/algorithms/bounded-sum.h
The same naming convention is used for private, confidential variables, and privitized variables that have had noise added. This makes it harder to read and audit the code. For example, in this line:
void AddEntry(const T& t) override {
Is t
a confidential value or a non-confidential value?
How about here:
base::StatusOr<Output> GenerateResult(double privacy_budget) override {
DCHECK_GT(privacy_budget, 0.0)
<< "Privacy budget should be greater than zero.";
if (privacy_budget == 0.0) return Output();
Output output;
double sum = 0;
double remaining_budget = privacy_budget;
Is the value privacy_budget
confidential or public? It's probably epsilon, so it's probably public, but I can't tell that from the naming convention.
What about sum
? Is that public or confidential?
Interesting suggestion. We haven't given much thought to this sort of style/convention, but I can see how it would be useful. Do you know if there are any existing examples of this sort of convention in differential privacy code? We can come up with our own, but I'd like to avoid proliferating standards for this, if possible.
Sure. Check out https://github.com/uscensusbureau/census2020-das-2010ddp/blob/803126100083f6811aaf5bcb0be79c4fc7b1a148/das_decennial/programs/engine/primitives.py#L132
It's not quite the naming approach I recommend here. However, you will see code like this:
shape = np.shape(true_answer)
# TODO: Implement CSPRNG and floating point
self.protected_answer = prng.laplace(loc=true_answer, scale=self.scale, size=shape)