mathnet-numerics icon indicating copy to clipboard operation
mathnet-numerics copied to clipboard

Is Mean for NegativeBinomial distribution correct?

Open dsyme opened this issue 8 years ago • 6 comments

Playing around with distributions and it seems that the Mean of the NegativeBinomial distribution doesn't match the mean of samples.

open MathNet.Numerics
let b7 = Distributions.NegativeBinomial(2.0, 0.5) 

/// This gives about 4.0
[ for i in 0 .. 100 -> b7.Sample() ]  |> List.averageBy float

//This says 2.0
b7.Mean

dsyme avatar Dec 05 '16 22:12 dsyme

That looks suspicious, thanks for reporting!

cdrnet avatar Dec 05 '16 22:12 cdrnet

It turns out the mean is correct, but the samples return the number of trials (for reaching r successes) instead of the number of failures (until r successes are reached). Both are common definitions of the distribution, but they of course should be consistent within the implementation. We chose the second definition since it can be extended to all positive real numbers of r instead of just integers.

Quick workaround until it is fixed: subtract r (here: 2.0) from all the generated samples.

Unfortunately since r is real, we cannot actually subtract it from our samples since the discrete distribution interface assumes samples are integers. We may need to split this distribution if one accepting only integer r and thus also samples integers, and an extended one for real r with another interface.

cdrnet avatar Dec 07 '16 06:12 cdrnet

I'd like to point out that the quick work-around doesn't seem to work. In the example given by dsyme if I run this on my PC I get positive integer and 0 samples. I don't follow how this can be interpreted as "the number of trials (for reaching r successes)". Knocking off 2.0 from a 0 sample, regardless of types, doesn't seem to make sense.

Can someone elaborate?

thamilto avatar Jul 02 '18 17:07 thamilto

I might have pinpointed the problem in NegativeBinomial, the code is:

        static int SampleUnchecked(System.Random rnd, double r, double p)
        {
            var lambda = Gamma.SampleUnchecked(rnd, r, p);
            var c = Math.Exp(-lambda);
            var p1 = 1.0;
            var k = 0;
            do
            {
                k = k + 1;
                p1 = p1*rnd.NextDouble();
            }
            while (p1 >= c);
            return k - 1;
        }

The first code line should be replaced by

var lambda = Gamma.SampleUnchecked(rnd, r, (1 - p)/p);

Also, it would be nice to point-out in the code that the second part is actually drawing a Poisson sample (which could benefit from the optimized implementation in PoissonDistribution.cs which distinguish small lambdas from big lambdas).

vermorel avatar Mar 08 '19 20:03 vermorel

Is there a plan to implement the fix suggested by @vermorel above? This would solve a similar issue I'm seeing.

EDIT: Correction - having given further thought it should be p/(1-p) not (1-p)/p

tlatarche avatar Aug 17 '21 08:08 tlatarche

The sample method is simply incorrect, @tlatarche's proposal is correct. Just to recall the relationships: With the chosen definition (r is number of successes, p is probability of success), the mean is given by (1-p)/p*r and the NB distribution can be expressed as Poisson distribution with parameter being a sample from the Gamma distribution with parameters r and p/(1-p). The current sampling implementation can be interpreted as using instead a p', which is implicitly defined by p = p'/(1-p'). With a little bit of calculus, it can be shown that this leads to a sampling mean which differs by the true mean by a factor of 1-p, which is consistent with @dsyme's observation.

Arlofin avatar Sep 29 '22 00:09 Arlofin