Venturecxx Add madeSpLogDensityOfDataBound methods for the categorical family of distributions

[Separated from #455].

(The "categorical family" is defined for the present purposes in #454).

All those SPs that track sufficient statistics have easily computable non-trivial upper bounds on the probability of the aux.

First, consider the non-conjugate case. Here, all the trials are actually independent conditioned on the weight(s). The optimum weights are easily equal to the empirical frequencies, and the probability of the counts under those weights is computable (and is less than 1 if more than one outcome appears).
In the collapsed conjugate case, the prior can't do any better than concentrate all its mass on those optimal weights, so the empirical frequency bound remains valid (if not necessarily optimally tight).
This observation is actually general for all collapsed conjugate bounds.
Once #455 is done, the same will apply to the bounds reported by uncollapsed conjugate SPs as well.

Mar 20 '16 23:03 axch

The empirical frequencies bound remains correct for the case of the symmetric Dirichlet prior, but since that prior is so constrained, there is hope of being able to find something considerably tighter. Preliminary thinking comes up with some potentially usable limiting cases. Suppose there are k options and n trials. Then:

If n = 0, all alpha are equal, and the probability is 1.
If n > 0, probability <= 1/k because the first trial of a symmetric Dirichlet categorical unavoidably has probability 1/k.
If all the trials produced the same result, as alpha -> 0, probability -> 1/k.
If at least two trials produced distinct results, as alpha -> 0, probability -> 0.
As alpha -> infinity, probability -> 1/k^n.
- If the distribution on results is exactly uniform, this is the maximum, because it is the empirical frequency bound for this case.

Mar 20 '16 23:03 axch

In fact, if there are at least two distinct results, the probability can't be more than 1/k^2. Why? By exchangeability, we can take the second trial to have produced a different result from the first. Its probability is alpha/(1 + k*alpha) <= 1/k.

Let the number of distinct results be m. Iterating the above argument, the probability is at most 1/k^m. Now, take the first m trials to have produced all different results and consider the probability of the remaining trials. These are drawn from a distribution no more favorable than a symmetric Dirichlet on now m possible outcomes (with pseudocounts at least 1, at that). So we can iterate the above to compute a non-trivial upper bound on the probability.

If m proves equal to k at every iteration of the previous argument, we will reach the empirical frequency bound of 1/k^n. However, the bounding argument is somewhat unsatisfying if m is ever strictly less than k. The probability of the first m results being distinct will approach 1/k^m from below as alpha approaches infinity; whereas if m < k, the prior facing the remaining n-m trials will only approach a symmetric Dirichlet on m outcomes as alpha approaches zero, and priors given by all other alpha will be worse (because they will spend more mass predicting the k-m outcomes that will never happen).

Mar 20 '16 23:03 axch

Now that #455 is done, the uncollapsed case looks like it will use the same bounds methods as the collapsed case, exposing them through the weightBound method of the PosteriorAAALKernel.

Mar 28 '16 16:03 axch

Venturecxx Venturecxx copied to clipboard

Add madeSpLogDensityOfDataBound methods for the categorical family of distributions

Venturecxx
Venturecxx copied to clipboard