pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Add icdf functions for distributions

Open michaelraczycki opened this issue 2 years ago • 26 comments

Description

We are looking for help to implement inverse cumulative distribution (ICDF) functions for our distributions!

How to help?

This PR should give a template on how to implement and test new icdf functions for distributions: https://github.com/pymc-devs/pymc/pull/6528

ICDF functions allow users to get the value associated with a specific cumulative probability.

So far we've added 2 examples for continuous distribution

  • Uniform: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/continuous.py#L348-L351
  • Normal: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/continuous.py#L541-L548

And an example for a discrete distribution:

  • Geometric: https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/pymc/distributions/discrete.py#L824-L832

Multiple sources describing the icdf function for any specific distribution can be found, you're free to choose which one is working for you. To start with I recommend checking:

  • https://help.imsl.com/c/2016/html/cnlstat/index.html#page/CNL%2520Stat%2Fcsch11.14.01.html%23
  • Wikipedia, E.g.: https://en.wikipedia.org/wiki/Normal_distribution Zrzut ekranu 2023-03-18 o 15 49 21 It should be called "Quantile" as on the screenshot above.

New tests have to be added in test_continuous.py for continuous distributions, and test_discrete.py for discrete ones. You can use existing tests as a template:

https://github.com/pymc-devs/pymc/blob/2fcce433548a411f695d684ffb2d001a82a35f20/tests/distributions/test_continuous.py#L282-L286

Don't hesitate to ask any questions. You can grab as many distributions to implement moments as you want. Just make sure to write in this issue so that we can keep track of it.

Profit with your new open source KARMA!

The following distributions don't have an icdf method implemented:

  • [x] Beta #6845
  • [x] Kumaraswamy #6642
  • [x] Exponential #6641
  • [x] Laplace #6707
  • [x] StudentT #6845
  • [x] Cauchy #6747
  • [x] HalfCauchy
  • [x] Gamma #6845
  • [x] HalfNormal
  • [x] Weibull #6802
  • [x] LogNormal #6766
  • [x] HalfStudentT
  • [ ] Wald
  • [x] Pareto #6707
  • [x] InverseGamma
  • [ ] ExGaussian
  • [ ] Binomial #7362
  • [ ] BetaBinomial
  • [ ] Poisson
  • [ ] NegativeBinomial
  • [ ] DiracDelta
  • [x] DiscreteUniform #6617
  • [ ] HyperGeometric
  • [ ] Categorical
  • [ ] CustomDist (allow user to pass one, or try to infer like we do for logp/logcdf already)
  • [ ] AsymmetricLaplace
  • [ ] SkewNormal
  • [x] Triangular #6802
  • [ ] DiscreteWeibull
  • [x] Gumbel #6802
  • [x] Logistic #6747
  • [x] LogitNormal
  • [ ] Interpolated
  • [ ] Rice
  • [x] Moyal #6802
  • [ ] PolyaGamma
  • [ ] Mixture (requires an interative algorithm based on the logcdf, see here)

Note that not all of the icdf equations will have closed solution, so it's recommended to first start with the ones that can be found in closed form, as they will be easier to implement and will contribute to the task further with providing other contributors with templates to understand the topic better. The list above is not final, and I'll try to update it to contain all distributions available for taking.

michaelraczycki avatar Mar 18 '23 13:03 michaelraczycki

Hi @michaelraczycki or anyone else reviewing this, this is my first attempt to contribute to PyMC, so please let me know if this PR makes sense. If this turns out to be useful, I am happy to add ICDF functions for more distributions.

gokuld avatar Mar 23 '23 11:03 gokuld

Hey @gokuld! Thank you for your contribution, it look promising. For the future reference please add a comment under the issue, letting others know that you're starting to work on specific issue/ parts of the issue. This assure that you're not working in parallel with someone else on the same part of the development.

michaelraczycki avatar Mar 23 '23 20:03 michaelraczycki

Hey @gokuld! Thank you for your contribution, it look promising.

Thanks @michaelraczycki .

For the future reference please add a comment under the issue, letting others know that you're starting to work on specific issue/ parts of the issue. This assure that you're not working in parallel with someone else on the same part of the development.

Sure, I will post a comment when I start work to avoid parallel duplicate work from next time!

gokuld avatar Mar 24 '23 06:03 gokuld

Hey all, I am starting work on the ICDF for the continuous beta distribution. Let me know if anyone else is working on this already. (@michaelraczycki)

gokuld avatar Mar 31 '23 10:03 gokuld

@gokuld if there's no comment under this issue saying that someone reserves it you don't need to ask. Just call it and it's yours :) Also in case so for any reason you can't / don't want to work on the issue anymore please also let us know here. Good luck!

michaelraczycki avatar Mar 31 '23 12:03 michaelraczycki

@gokuld AFAICT the inverse CDF of the beta distribution doesn't have a closed form solution, so you would need an iterative algorithm which may not be trivial to write if you are not familiar with PyTensor. Ignore if you were aware of the fact ;)

ricardoV94 avatar Mar 31 '23 14:03 ricardoV94

@michaelraczycki sure! Thank you.

In addition I will also be implementing ICDFs for these: pymc.distributions.continuous.Kumaraswamy pymc.distributions.continuous.Exponential

@ricardoV94 Yes, however I discovered this only after I started working on the ICDF function for the beta distribution. I was about to create a betaincinv function in pytensor. However I might need likely need some review of the approach / help here (especially with implementing the gradient in pytensor) and was about to open that as a draft PR. If the iterative approach you mention turns out to be simpler to implement, I will go for it. I need to know more about that. Perhaps we can discuss this in the draft PR for the beta ICDF.

gokuld avatar Mar 31 '23 18:03 gokuld

I'll be picking up

  • Laplace
  • Pareto

Cheers :smile:

james-2001 avatar May 04 '23 20:05 james-2001

I have no immediate plans of finishing the work on the beta distribution, anyone else interested may pick it up.

gokuld avatar May 04 '23 20:05 gokuld

Good luck @james-2001 , and than you for your contribution @gokuld !

michaelraczycki avatar May 05 '23 07:05 michaelraczycki

I will work on the LogNormal :)

amyoshino avatar Jun 01 '23 01:06 amyoshino

I'm tackling now:

  • Cauchy distribution
  • Logistic distribution

😄

amyoshino avatar Jun 01 '23 04:06 amyoshino

~~This should suffice for all the "Half"Distributions:~~

def icdf(value, *args)
  return icdf(abs(Full.dist(*args), value))

~~Where Full is the equivalent non-half version of the distribution (Normal for HalfNormal, Student for HalfStudent and so on).~~

Update: No, I don't think it will work, because our automatic cdf is also wrong for these

ricardoV94 avatar Jun 16 '23 08:06 ricardoV94

Thanks for the tips, I am going to work on and add to the next PR the Half Cauchy and Half Normal implementation since I am working on the lognormal already and just got the Cauchy icdf merged.

Once I get used to this approach I can try to adapt it to the other "Halfs".

amyoshino avatar Jun 16 '23 13:06 amyoshino

Thanks for checking out the half-dist idea @amyoshino, you made me realize these don't work as other transforms and the automatic icdf should raise. I opened a PR for that effect: https://github.com/pymc-devs/pymc/pull/6793

ricardoV94 avatar Jun 23 '23 14:06 ricardoV94

Thanks for checking out the half-dist idea @amyoshino, you made me realize these don't work as other transforms and the automatic icdf should raise. I opened a PR for that effect: #6793

@ricardoV94 I'm glad I was able to help! 😄

amyoshino avatar Jun 24 '23 01:06 amyoshino

I will now work on the Triangular, Weibull, Gumbel and Moyal distributions :)

amyoshino avatar Jun 27 '23 03:06 amyoshino

It looks like the remaining ones have no closed form (not that I have found so far). I will give it a try on developing the icdf functions for the remaining ones. It might take a while but I will do my best to get used to all we need as fast as possible. So, just to get some focus in here, I will start with some that require the Inverse Regularized Gamma function and Inverse Regularized Beta function to get used to its implementation:

  • Gamma Distribution
  • ChiSquared Distribution
  • Beta Distribution
  • StudentT Distribution

amyoshino avatar Jul 14 '23 13:07 amyoshino

I will work on Binomial now.

niknow avatar Jun 15 '24 13:06 niknow

Hi, are there any more functions that need to be added here?

fireddd avatar Sep 20 '24 18:09 fireddd