Additional Functions for Statistical Distributions (PPF, SF)
Motivation
A Percent Point Function (or Quantile Function or Inverse CDF) for each covered distribution would be a valuable addition, since it's highly relevant for common statistical problems and tests. Furthermore, complementary to the CDF, a Survival Function (1-cdf(x)) would be useful for statistics and engineering. The latter could be implemented as a separate procedure or by letting CDF return 1-cdf(x), conditional on an optional argument passed to the CDF function.
Prior Art
This is an example of an f90 implementation of a normal percent point function.
Reference: Michael Wichura, Algorithm AS 241: The Percentage Points of the Normal Distribution, Applied Statistics, Volume 37, Number 3, 1988, pages 477-484.
Additional Information
No response
Hey, I am interested in contributing for these features beginning with the normal distribution. I have been going through the prior art and have a potential plan for this:
-
A separate procedure for survival function
-
Percent Point Function For this I was thinking of using an inverse error function. As error function was used to calculate the cdf. But I don't there is an inverse error function in fortran. I was also thinking about using binary search to find x such that cdf(x) - p = 0. But this would be too slow. So, implementing an AS 241 approach seems like the best choice.
I am ready to start working on a PR that includes normal_sf and normal_ppf (using AS 241).
Please let me know if this plan looks good
Hey @BhoomishGupta ,
Thanks for the interest! Fortran does not have an inverse error function, but it's been discussed to add it to stdlib. That would make a few things easier. Yes, the binary search/bisection method are likely slower albeit simple, so the AS241 approach may be better. What do others think, looking at it?
I'm curious method were you thinking of using for the survival function to ensure tail stability? (I've not looked at this in detail, since 1-cdf has always been good enough for my applications).
Hey @sebastian-mutz,
I was thinking that instead of using survival function = 1 - cdf, which will lead to tail stability issues. I am going to write the full formula for survival function using erfc(erfc is calculated directly. So, it avoids subtraction from 1).
So the formula for survival function becomes : 0.5 * erfc(x / sqrt(2.0))
Hey @BhoomishGupta, that sounds good. Note that I'm not one of the major stdlib contributors, but I do work quite a bit with statistics, and I'd be happy to look over your PRs here.
@BhoomishGupta your proposition sounds good to me. I suggest that you open a PR with some code and specs. The community can then discuss the API and implementation.
Great, thank you @jvdp1 and @sebastian-mutz for the green light!
I'll get started on the PR with the proposed implementations for normal_sf (using the erfc method) and normal_ppf (using Algorithm AS 241)
I'll link the PR to this issue once it's ready for review and further discussion.