opendp icon indicating copy to clipboard operation
opendp copied to clipboard

Simultaneous accuracy for vector valued queries

Open Shoeboxam opened this issue 4 years ago • 0 comments

Accuracy statements on vector valued queries are a vector of individual confidence intervals, not a simultaneous confidence interval! This should be made clearer in the documentation. It would also be useful to add an estimator for a simultaneous confidence interval, parameterized by p dimensions. There are some nice simplifications here since the noise is isotropic and the underlying distribution is properly normal or laplacian, not t.

For the simultaneous gaussian, the test statistic would be Q/p, where Q ~ Chisq(p). This is just a simplification of Hotelling's T test. Since the noise is isotropic, the confidence region is spherical, and we can capture it with a single accuracy parameter.

The sum of laplacian random variables is more complicated. I can't really give an exact bound without a CAS, or baking formulas for each p (not sure if it's possible to simplify these expressions into one analytic expression), but Chebyshev's inequality can give us a loose bound. I gave a really rough outline below. The variance v of the sum of laplacians Y will be 2 * p * sigma^2, because the noise is uncorrelated. We can assume the noise is centered at zero wlog because accuracy is invariant to shift.

If D = |Y - x|, then:

alpha = 1 - P(D < acc)
     <= 1 - P(Y > acc) - P(Y < -acc) by cantelli
      = 1 - 2P(Y > acc) by symmetry
     <= 1 - 2v/(v + acc^2) by chebyshev

Which inverts to this:

acc = sqrt((alpha + 1) * v / (1 - alpha))
    = sqrt((alpha + 1) * 2 * p * sigma^2 / (1 - alpha))

I haven't really sanity checked any of this. Might be possible to make this tighter with a Chernoff bound instead.

Shoeboxam avatar Oct 14 '21 16:10 Shoeboxam