mathnet-numerics
mathnet-numerics copied to clipboard
Error in sampling from Negative Binomial
I'm using Math.Net to sample values from an overdispersed Poisson distribution. I'm doing this using the negative binomial link, as described here: https://stat.ethz.ch/pipermail/r-help/2002-June/022425.html
My code currently looks like this:
private static double ODPoisson(double lambda, double dispersion)
{
double p = 1 / (dispersion - 1);
double r = lambda * p;
if (dispersion == 1)
{
return Poisson.Sample(lambda);
}
else
{
return NegativeBinomial.Sample(r, p);
}
}
What I've found is that this works well for low values of lambda. As soon as I try to sample with a lambda of 1000 and a dispersion parameter of 2, the code simply 'hangs', i.e. method keeps running but no value is returned. I've even looped through this method to test various combinations of input parameters (lambda from 1 to 1000, dispersion = 2), and the code 'hangs' at different combinations every time. Sometimes it'll sample for all combinations up to lambda = 750, other times up to lambda = 500. This happens by simply re-running the console app and making no code changes.
I've included the 'IsValidParameterSet' check before every run, and even though the parameters are considered valid, the sample is still not generated. To further test whether the input parameters are valid, I've tested the same parameters with the NegativeBinomial.CDF method at the 50th percentile, and it returns a value every time.
Is there an error in the NegativeBinomial.Sample method? If not, how do I fix this problem?
Thanks for reporting this.
The current implementation is indeed not suitable for these parameters, as the acceptance area is not only getting very small, it is actually getting to zero (floating point precision) resulting in an infinite loop
Tasks:
- [ ] detect such cases and make sure we never end up in an infinite loop. This applies to a few other discrete distributions as well
- [ ] fallback to a modified sampling approach depending on the parameter range (see Poisson for example).
- [ ] NegativeBinomial, Poisson and a few other discrete distributions are currently excluded from the sample shape unit test. Add them back so we have all distributions covered again.
I noticed this is still an issue.
How about using the efficient methods described here: http://link.springer.com/article/10.1007/BF02293108
They may not be as accurate as the one you're using now (they may be though, not quite sure) but I think they'd be a good replacement or at very least good fallback methods for this kinda case.
Related: #455
There are several issues here:
- The code provided by @Mlambrechts is wrong, parameters r and p need to be determined as
double p = 1 / dispersion
anddouble r = lambda / (dispersion - 1)
- p = 0 actually should be considered to be an invalid parameter, since the distribution is not defined in this case (asymptotically, it reaches infinity, which cannot be represented as integer). In the current implementation, p = 0 not only leads to unhandled undefined mathematical operations (like division by zero), but also to an endless loop in the sampling method. I fixed this in #960 by disallowing p = 0. Note that the run time for sampling still goes to infinity when p approaches zero. Further improvements are welcome.