mathnet-numerics Suboptimal precision of Binomial CDF at extremes

I think the Distributions.Binomial.CDF method needs some work, since it behaves worse than a hacky one I made at very extreme values.

I was trying to solve Rosalind problem INDC, in particular for n=47. I was porting an older solution of mine where I implemented my own Binomial CDF and PDF. While changing these to use the "more reliable" MathNet implementation instead of my own, but discovered that the MathNet implementation is actually worse.

If you see my relevant commit, you should be able to see that I am calculating the CDF naively, by simply summing the PDF (Binomial.PMF seems to working well enough, I didn't notice problems). In the commented out method (try Ctrl+F "MathNet bug"), you will see that I was also trying a MathNet call which should in theory give the same result, and for most cases it's close enough (error of around 0.0001%, so probably a precision issue).

However, when we get to sets of parameters where the resulting probability is very small, like 10^-10, problems occur: My naive CDF continues to return the probability accurately enough for Rosalind to accept the solution as correct, but the MathNet call simply returns 0.0 instead. Because the probability gets turned into a Log10 later, this 0.0 causes catastrophic failure (since Log10(0)~-Inf).

For a less cumbersome test: The probability of obtaining 86 or more successes in 94 trials with probability 0.5 of succeeding at each one is obviously vanishingly small, but non-zero. My implementation returns 0.0000000000000000061807934732275974. 1 - Binomial.CDF(0.5, 94, 85) returns 0.0.

Given that this is a relatively pedestrian application, I think it speaks for a need to audit the Binomial.CDF implementation, and make sure it is able to cope better with probabilities very close to 1 or 0.

Dec 03 '15 03:12 Superbest

Note also WolframAlpha's calculation agrees with my result and not MathNet's.

Dec 03 '15 03:12 Superbest

BTW, as a temporary solution, I propose adding an if-clause that checks whether the parameters are very close to extremes, and uses my method in that case. I think my method might have very poor performance, so this isn't an ideal solution, but I think better than returning a qualitatively wrong result.

For a better solution, I see that Binomial.CDF is just a call to the Beta function. I'm afraid I'm not familiar enough with that to say how to go about fixing it.

Dec 03 '15 03:12 Superbest

Thanks for reporting this!

Dec 03 '15 05:12 cdrnet

@cdrnet No problem - any ideas on where one would start with the fix?

Dec 10 '15 23:12 Superbest

mathnet-numerics mathnet-numerics copied to clipboard

Suboptimal precision of Binomial CDF at extremes

mathnet-numerics
mathnet-numerics copied to clipboard