pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Add icdf functions for Beta, Gamma, Chisquared and StudentT distributions

Open amyoshino opened this issue 2 years ago • 24 comments

What is this PR about? Adds ICDF functions to Beta, Gamma, Chisquared and StudentT distributions.

Issue https://github.com/pymc-devs/pymc/issues/6612

This is a work in progress. The quantile functions depend on two main functions:

  • Inverse of the regularized incomplete beta function: betaincinv
  • Inverse to the regularized lower incomplete gamma function: gammaincinv

Those are implemented in SciPy but not in Pytensor yet. When importing the two functions from scipy.special and plugging them in the desired formula, I get a TypeError like this below:

"TypeError: ufunc 'betaincinv' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"

image

It looks like the most straightforward way to make these icdf functions work is to implement them in Pytensor, in the same way the pt.betainc has been implemented

Any comments about if this is the right path? If so I am happy to open an issue in Pytensor as a feature request, and possibly helping on its development if nobody else wants to take it. But if there is any ideas on how to make it simpler than that, let me know.

References:

  • SciPy betaincinv: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.betaincinv.html
  • SciPy gammaincinv: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gammaincinv.html
  • ChiSquared: https://www.wolframalpha.com/input?i=chi+squared+distribution+quantile+function
  • Beta: https://www.wolframalpha.com/input?i=beta+distribution+quantile+function
  • Gamma: https://www.wolframalpha.com/input?i=gamma+distribution+quantile+function
  • StudentT: https://www.wolframalpha.com/input?i=student+distribution+quantile+function

Checklist

Major / Breaking Changes

  • ...

New features

  • ...

Bugfixes

  • ...

Documentation

  • ...

Maintenance

  • ...

:books: Documentation preview :books:: https://pymc--6845.org.readthedocs.build/en/6845/

amyoshino avatar Aug 04 '23 23:08 amyoshino

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.97%. Comparing base (a033261) to head (262d868).

:exclamation: Current head 262d868 differs from pull request most recent head 00c1f34. Consider uploading reports for the commit 00c1f34 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6845      +/-   ##
==========================================
- Coverage   92.29%   90.97%   -1.33%     
==========================================
  Files         100      100              
  Lines       16875    16887      +12     
==========================================
- Hits        15575    15363     -212     
- Misses       1300     1524     +224     
Files Coverage Δ
pymc/distributions/continuous.py 97.81% <100.00%> (+0.02%) :arrow_up:

... and 14 files with indirect coverage changes

codecov[bot] avatar Aug 04 '23 23:08 codecov[bot]

@ricardoV94 now that https://github.com/pymc-devs/pytensor/pull/502 is merged, we are able to complete this PR.

The tests are failing because betaincinv and gammaincinv cannot be imported from pytensor.tensor.math, I guess it is matter of time to pytensor to be updated. I will restart the checks when it happens. Locally all tests are passing.

Let me know what you think.

amyoshino avatar Jan 07 '24 19:01 amyoshino

You can see which PyTensor version is being installed in the CI.

ricardoV94 avatar Jan 07 '24 20:01 ricardoV94

Yeah it's strange it's still installing 2.18.4, even though 2.18.5 was released already some days ago. We may need to be just a bit more patient?

ricardoV94 avatar Jan 07 '24 20:01 ricardoV94

Yeah it's strange it's still installing 2.18.4, even though 2.18.5 was released already some days ago. We may need to be just a bit more patient?

Right! That's what I thought, I saw the pytenor release 4 days ago and thought it was a good experience to try it now and see if the update immediately applies. No rush on it, it will work soon.

amyoshino avatar Jan 07 '24 22:01 amyoshino

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@maresb any idea why CI is not yet installing the last PyTensor? Was there an issue on the conda side?

ricardoV94 avatar Jan 09 '24 06:01 ricardoV94

There was a glitch in the deployment of 2.18.5 to anaconda.org.

I reran it, so hopefully it fixes things.

maresb avatar Jan 09 '24 14:01 maresb

It uploaded successfully on anaconda.org, so I restarted the tests.

maresb avatar Jan 09 '24 21:01 maresb

Interesting way to restart the tests! I searched for and the recommendation I found was to push an empty commit, but I think this is nicer.

amyoshino avatar Jan 09 '24 21:01 amyoshino

You can also just ask a dev with write permissions. We can restart them from GitHub interface

ricardoV94 avatar Jan 09 '24 22:01 ricardoV94

Apologies, I have re-requested review by mistake.

amyoshino avatar Jan 10 '24 01:01 amyoshino

Apologies, I have re-requested review by mistake.

No worries!

ricardoV94 avatar Jan 11 '24 15:01 ricardoV94

@ricardoV94 I don't think it is failing due to the changes we've made in this PR. The failed ones below seems to be unrelated with the icdf functions, let me know if I am mistaken. I've synced the branch, let's see if we still get the errors.

tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.05-non_psi_args8] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.3-non_psi_args9] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.4-non_psi_args10] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.9-non_psi_args11] FAILED [ 98%]

amyoshino avatar Jan 13 '24 11:01 amyoshino

They could actually be related. The hurdle mixture uses truncated rvs for the non-zero part. And with these new icdfs several truncated variables will switch from rejection sampling to inverse cdf sampling in the random graph.

I restarted, but I would also suggest you check locally

ricardoV94 avatar Jan 14 '24 07:01 ricardoV94

Got it, thanks for the pointers. I am still investigating what is the problem. I have spent few hours trying to debug but I still can't find what is going on. So far, drawing from Gamma works well, but inside the Truncated it does not. The same do not happen with other functions that I tried (Normal, Beta, HalfCauchy, InverseGamma - picked here some that have icdf functions and some that does not).

It looks like the error points to this numpy function, and now I am in the quest to find how to access high and low passed to the rng for the Truncated Gamma: https://github.com/numpy/numpy/blob/448e4dabb5e7d593af7d164a335dc7e44e17ec95/numpy/random/_generator.pyx#L1081

Here is what I got until now, and I will keep you posted about progress. But If by looking at it you have any clues, let me know.

Drawing from Gamma, Beta, etc works as expected (I added some prints for debug purposes): image

Drawing from Truncated only fails for Gamma: image

amyoshino avatar Jan 17 '24 12:01 amyoshino

I have an update but it is still a mystery for me that it raises the high - low overflow error when passing only lower but not when passing lower and upper. And that it only happens with Truncated(Gamma) but not with Truncated(Beta) for example.

image

Work in progress but let me know if you have any ideas on where to look.

amyoshino avatar Jan 19 '24 02:01 amyoshino

I just narrowed the problem down to the following: As stated by @ricardoV94, the hurdle mixture uses truncated rvs for the non-zero part. The failing tests are breaking when passing Gamma.dist to the Truncated.dist.

  • When I try to simulate the part it breaks the test with Truncated.dist(Gamma.dist(alpha = 5, beta = 2), lower = 0.01).eval() it fails in the middle when trying to sample from a uniform distribution with UniformRV.rng_fn. It does so because the inputs for this function (rng_fn) in tensor.random.op.py is something like [array(7.64934096e-07), array(nan)] and when trying to generate a number with upper or lower as NaN, it gives the Overflow error we are observing (see this link line 1081)
  • When trying another distribution with icdf implemented, like Beta.distr, and run Truncated.dist(Beta.dist(alpha = 5, beta = 2), lower = 0.01).eval() it works. In this case the arguments for the rng_fn function is something like [array(1.835008e-05), array(1.)], the number is then sampled from a uniform distribution and the function finish running.
  • Another experiment that went successful is when I pass lower and upper to Truncated.dist(Gamma.dist(alpha = 5, beta = 2), lower = 0.01, upper = 1000).eval() it finishes without any problems.

I am clueless about why in Gamma the Op does not create the second argument in the case of Truncated(Gamma) but does for other Truncated distributions I have tested. I am stuck in the part where it fails when calling the functiongetattr() inside pytensor.tensor.random.op.rng_fn(), and running the debug mode in this goes to so many places that makes me lost. In summary, I did my best in trying to find where (and if) I can solve this problem within the scope of this PR, and can't proceed here without help with this.

amyoshino avatar Jan 20 '24 22:01 amyoshino

Is the nan coming from pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf)?

ricardoV94 avatar Jan 29 '24 17:01 ricardoV94

Sounds like PyTensor gammainc C code does not have a special case for np.inf:

https://github.com/pymc-devs/pytensor/blob/19e1a982e5a2e9ba2bd9d708133abf102c6fb835/pytensor/scalar/c_code/gamma.c#L217-L223

In contrast this scipy has a special case? https://github.com/scipy/scipy/blob/4ecf5638c36c3a87b473056f5fffecfeddffd368/scipy/special/cephes/igam.c#L149-L151

ricardoV94 avatar Jan 29 '24 18:01 ricardoV94

pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf)

Indeed, this code returns nan. I tried to fix it writing the special case for np.inf in the gamma.c function but it didn't work out yet. Maybe there are other places I need to change in order to make it work.

`DEVICE double GammaP (double n, double x) { /* --- regularized Gamma function P / if ((n <= 0) || (x < 0)) return NPY_NAN; / check the function arguments / if (x <= 0) return 0; / treat x = 0 as a special case */ if (x < n+1) return _series(n, x) *exp(n *log(x) -x -logGamma(n)); if (isinf(n)) { if (isinf(x)) return NPY_NAN; return 0; } if (isinf(x)) return 1; return 1 -_cfrac(n, x) *exp(n log(x) -x -logGamma(n)); } / GammaP() */

/--------------------------------------------------------------------/

DEVICE double GammaQ (double n, double x) { /* --- regularized Gamma function Q / if ((n <= 0) || (x < 0)) return NPY_NAN; / check the function arguments / if (x <= 0) return 1; / treat x = 0 as a special case */ if (x < n+1) return 1 -_series(n, x) *exp(n *log(x) -x -logGamma(n)); if (isinf(n)) { if (isinf(x)) return NPY_NAN; return 1; } if (isinf(x)) return 0; return _cfrac(n, x) *exp(n log(x) -x -logGamma(n)); } / GammaQ() */`

Even with this, the pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf) still returns nan.

amyoshino avatar Feb 04 '24 21:02 amyoshino

You need to delete pytensor cache or bump the Op c_cache_version number. Try pytensor-cache clear or pytensor cache-clear I never remember. If clear doesn't work use purge

ricardoV94 avatar Feb 04 '24 21:02 ricardoV94

Thanks for the pointers! You were on point in all suggestions, the logcdf problem and the solution to it.

After changing the gamma.c function and pytensor-cache clear and pytensor-cache purge (I needed that too), the Truncated icdf worked as expected and all the text_mixture.py tests pass.

I will open an issue and a PR in pytensor to change the gamma.c code.

amyoshino avatar Feb 05 '24 22:02 amyoshino

That PR should probably do a similar fix for the complementary gammainc function (gammaincc?)

Edit: I see you were already onto it

ricardoV94 avatar Feb 05 '24 23:02 ricardoV94

@amyoshino PyTensor new release should be in conda soon. We will need to update the pin on PyTensor version here, as it was a "major" release (or in a separate PR if there are other things that need fix)

https://github.com/pymc-devs/pytensor/releases/tag/rel-2.19.0

ricardoV94 avatar Mar 09 '24 19:03 ricardoV94

@amyoshino PyTensor new release should be in conda soon. We will need to update the pin on PyTensor version here, as it was a "major" release (or in a separate PR if there are other things that need fix)

https://github.com/pymc-devs/pytensor/releases/tag/rel-2.19.0

That's awesome!! Looking forward to it to be updated! Let me know if I need to do anything with respect to this PR as part of the process.

amyoshino avatar Mar 16 '24 01:03 amyoshino

You need to bump the PyTensor dependency constraints on the PyMC side

ricardoV94 avatar Mar 16 '24 07:03 ricardoV94

Ah, I see! Alright, let me try that then! Thanks for the pointer.

amyoshino avatar Mar 16 '24 14:03 amyoshino

Just pushed the bump dependency modifications in here. Seems like there is one test failing though, which wasn't failing when I ran the test suite locally.

amyoshino avatar Mar 19 '24 22:03 amyoshino

@amyoshino I'm taking core of the Pytensor bump in https://github.com/pymc-devs/pymc/pull/7203

ricardoV94 avatar Mar 20 '24 18:03 ricardoV94