joshua-oss
joshua-oss
Two more enhancements should be made in this PR: 1. Odometer should allow serialization and rehydration. This will enable stateless services to keep track of cumulative budget spend. 2. We...
Partially addressed in #486
I'll take a look at this later today. This is in the code that calculates noisy counts of frequencies from the categorical columns for sampling. We use category_epsilon_pct percent of...
I think this may be related to the `preprocessor_eps` parameter that was recently added to support preprocessing of continuous values. The budget for preprocessing defaults to 1.0, and is subtracted...
I have released `smartnoise-synth==0.2.8.1` with some minor diagnostic messages that clarify how much epsilon is available for each step. That might help diagnose what's happening. That version also updates to...
Note that `log_frequency` defaults to False, and we throw a privacy warning if the caller asks for log frequencies. Also note that batches typically include samples that differ in every...
When `log_frequency` is false, the sampler is actually sampling uniformly from the dataset. It's just doing so in a roundabout manner. The steps taken are: 1. Select a categorical column...
That's correct, though we still create the random conditional mask which selects a different categorical value to serve as a "label" for each row in the batch. The technique of...
Yes, I think that would work the best for the heterogeneous odometer, and for the homogeneous odometer, we just need the 'k' and the privacy parameters. At some point, we...