mathnet-numerics
mathnet-numerics copied to clipboard
Is Mean for NegativeBinomial distribution correct?
Playing around with distributions and it seems that the Mean of the NegativeBinomial distribution doesn't match the mean of samples.
open MathNet.Numerics
let b7 = Distributions.NegativeBinomial(2.0, 0.5)
/// This gives about 4.0
[ for i in 0 .. 100 -> b7.Sample() ] |> List.averageBy float
//This says 2.0
b7.Mean
That looks suspicious, thanks for reporting!
It turns out the mean is correct, but the samples return the number of trials (for reaching r successes) instead of the number of failures (until r successes are reached). Both are common definitions of the distribution, but they of course should be consistent within the implementation. We chose the second definition since it can be extended to all positive real numbers of r instead of just integers.
Quick workaround until it is fixed: subtract r (here: 2.0) from all the generated samples.
Unfortunately since r is real, we cannot actually subtract it from our samples since the discrete distribution interface assumes samples are integers. We may need to split this distribution if one accepting only integer r and thus also samples integers, and an extended one for real r with another interface.
I'd like to point out that the quick work-around doesn't seem to work. In the example given by dsyme if I run this on my PC I get positive integer and 0 samples. I don't follow how this can be interpreted as "the number of trials (for reaching r successes)". Knocking off 2.0 from a 0 sample, regardless of types, doesn't seem to make sense.
Can someone elaborate?
I might have pinpointed the problem in NegativeBinomial, the code is:
static int SampleUnchecked(System.Random rnd, double r, double p)
{
var lambda = Gamma.SampleUnchecked(rnd, r, p);
var c = Math.Exp(-lambda);
var p1 = 1.0;
var k = 0;
do
{
k = k + 1;
p1 = p1*rnd.NextDouble();
}
while (p1 >= c);
return k - 1;
}
The first code line should be replaced by
var lambda = Gamma.SampleUnchecked(rnd, r, (1 - p)/p);
Also, it would be nice to point-out in the code that the second part is actually drawing a Poisson sample (which could benefit from the optimized implementation in PoissonDistribution.cs which distinguish small lambdas from big lambdas).
Is there a plan to implement the fix suggested by @vermorel above? This would solve a similar issue I'm seeing.
EDIT: Correction - having given further thought it should be p/(1-p) not (1-p)/p
The sample method is simply incorrect, @tlatarche's proposal is correct. Just to recall the relationships: With the chosen definition (r is number of successes, p is probability of success), the mean is given by (1-p)/p*r and the NB distribution can be expressed as Poisson distribution with parameter being a sample from the Gamma distribution with parameters r and p/(1-p). The current sampling implementation can be interpreted as using instead a p', which is implicitly defined by p = p'/(1-p'). With a little bit of calculus, it can be shown that this leads to a sampling mean which differs by the true mean by a factor of 1-p, which is consistent with @dsyme's observation.