Add icdf functions for distributions
Description
We are looking for help to implement inverse cumulative distribution (ICDF) functions for our distributions!
How to help?
This PR should give a template on how to implement and test new icdf functions for distributions: https://github.com/pymc-devs/pymc/pull/6528
ICDF functions allow users to get the value associated with a specific cumulative probability.
So far we've added 2 examples for continuous distribution
- Uniform: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/continuous.py#L348-L351
- Normal: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/continuous.py#L541-L548
And an example for a discrete distribution:
- Geometric: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/discrete.py#L824-L832
Multiple sources describing the icdf function for any specific distribution can be found, you're free to choose which one is working for you. To start with I recommend checking:
- https://help.imsl.com/c/2016/html/cnlstat/index.html#page/CNL%2520Stat%2Fcsch11.14.01.html%23
- Wikipedia,
E.g.: https://en.wikipedia.org/wiki/Normal_distribution
It should be called "Quantile" as on the screenshot above.
New tests have to be added in test_continuous.py for continuous distributions, and test_discrete.py for discrete ones. You can use existing tests as a template:
https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/tests/distributions/test_continuous.py#L282-L286
Don't hesitate to ask any questions. You can grab as many distributions to implement moments as you want. Just make sure to write in this issue so that we can keep track of it.
Profit with your new open source KARMA!
The following distributions don't have an icdf method implemented:
- [x] Beta #6845
- [x] Kumaraswamy #6642
- [x] Exponential #6641
- [x] Laplace #6707
- [x] StudentT #6845
- [x] Cauchy #6747
- [x] HalfCauchy
- [x] Gamma #6845
- [x] HalfNormal
- [x] Weibull #6802
- [x] LogNormal #6766
- [x] HalfStudentT
- [ ] Wald
- [x] Pareto #6707
- [x] InverseGamma
- [ ] ExGaussian
- [ ] Binomial #7362
- [ ] BetaBinomial
- [ ] Poisson
- [ ] NegativeBinomial
- [ ] DiracDelta
- [x] DiscreteUniform #6617
- [ ] HyperGeometric
- [ ] Categorical
- [ ] CustomDist (allow user to pass one, or try to infer like we do for logp/logcdf already)
- [ ] AsymmetricLaplace
- [ ] SkewNormal
- [x] Triangular #6802
- [ ] DiscreteWeibull
- [x] Gumbel #6802
- [x] Logistic #6747
- [x] LogitNormal
- [ ] Interpolated
- [ ] Rice
- [x] Moyal #6802
- [ ] PolyaGamma
- [ ] Mixture (requires an interative algorithm based on the logcdf, see here)
Note that not all of the icdf equations will have closed solution, so it's recommended to first start with the ones that can be found in closed form, as they will be easier to implement and will contribute to the task further with providing other contributors with templates to understand the topic better. The list above is not final, and I'll try to update it to contain all distributions available for taking.
Hi @michaelraczycki or anyone else reviewing this, this is my first attempt to contribute to PyMC, so please let me know if this PR makes sense. If this turns out to be useful, I am happy to add ICDF functions for more distributions.
Hey @gokuld! Thank you for your contribution, it look promising. For the future reference please add a comment under the issue, letting others know that you're starting to work on specific issue/ parts of the issue. This assure that you're not working in parallel with someone else on the same part of the development.
Hey @gokuld! Thank you for your contribution, it look promising.
Thanks @michaelraczycki .
For the future reference please add a comment under the issue, letting others know that you're starting to work on specific issue/ parts of the issue. This assure that you're not working in parallel with someone else on the same part of the development.
Sure, I will post a comment when I start work to avoid parallel duplicate work from next time!
Hey all, I am starting work on the ICDF for the continuous beta distribution. Let me know if anyone else is working on this already. (@michaelraczycki)
@gokuld if there's no comment under this issue saying that someone reserves it you don't need to ask. Just call it and it's yours :) Also in case so for any reason you can't / don't want to work on the issue anymore please also let us know here. Good luck!
@gokuld AFAICT the inverse CDF of the beta distribution doesn't have a closed form solution, so you would need an iterative algorithm which may not be trivial to write if you are not familiar with PyTensor. Ignore if you were aware of the fact ;)
@michaelraczycki sure! Thank you.
In addition I will also be implementing ICDFs for these: pymc.distributions.continuous.Kumaraswamy pymc.distributions.continuous.Exponential
@ricardoV94 Yes, however I discovered this only after I started working on the ICDF function for the beta distribution. I was about to create a betaincinv function in pytensor. However I might need likely need some review of the approach / help here (especially with implementing the gradient in pytensor) and was about to open that as a draft PR. If the iterative approach you mention turns out to be simpler to implement, I will go for it. I need to know more about that. Perhaps we can discuss this in the draft PR for the beta ICDF.
I'll be picking up
- Laplace
- Pareto
Cheers :smile:
I have no immediate plans of finishing the work on the beta distribution, anyone else interested may pick it up.
Good luck @james-2001 , and than you for your contribution @gokuld !
I will work on the LogNormal :)
I'm tackling now:
- Cauchy distribution
- Logistic distribution
😄
~~This should suffice for all the "Half"Distributions:~~
def icdf(value, *args)
return icdf(abs(Full.dist(*args), value))
~~Where Full is the equivalent non-half version of the distribution (Normal for HalfNormal, Student for HalfStudent and so on).~~
Update: No, I don't think it will work, because our automatic cdf is also wrong for these
Thanks for the tips, I am going to work on and add to the next PR the Half Cauchy and Half Normal implementation since I am working on the lognormal already and just got the Cauchy icdf merged.
Once I get used to this approach I can try to adapt it to the other "Halfs".
Thanks for checking out the half-dist idea @amyoshino, you made me realize these don't work as other transforms and the automatic icdf should raise. I opened a PR for that effect: https://github.com/pymc-devs/pymc/pull/6793
Thanks for checking out the half-dist idea @amyoshino, you made me realize these don't work as other transforms and the automatic icdf should raise. I opened a PR for that effect: #6793
@ricardoV94 I'm glad I was able to help! 😄
I will now work on the Triangular, Weibull, Gumbel and Moyal distributions :)
It looks like the remaining ones have no closed form (not that I have found so far). I will give it a try on developing the icdf functions for the remaining ones. It might take a while but I will do my best to get used to all we need as fast as possible. So, just to get some focus in here, I will start with some that require the Inverse Regularized Gamma function and Inverse Regularized Beta function to get used to its implementation:
- Gamma Distribution
- ChiSquared Distribution
- Beta Distribution
- StudentT Distribution
I will work on Binomial now.
Hi, are there any more functions that need to be added here?