pymc
pymc copied to clipboard
Add icdf functions for Beta, Gamma, Chisquared and StudentT distributions
What is this PR about? Adds ICDF functions to Beta, Gamma, Chisquared and StudentT distributions.
Issue https://github.com/pymc-devs/pymc/issues/6612
This is a work in progress. The quantile functions depend on two main functions:
- Inverse of the regularized incomplete beta function:
betaincinv - Inverse to the regularized lower incomplete gamma function:
gammaincinv
Those are implemented in SciPy but not in Pytensor yet. When importing the two functions from scipy.special and plugging them in the desired formula, I get a TypeError like this below:
"TypeError: ufunc 'betaincinv' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"
It looks like the most straightforward way to make these icdf functions work is to implement them in Pytensor, in the same way the pt.betainc has been implemented
Any comments about if this is the right path? If so I am happy to open an issue in Pytensor as a feature request, and possibly helping on its development if nobody else wants to take it. But if there is any ideas on how to make it simpler than that, let me know.
References:
- SciPy betaincinv: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.betaincinv.html
- SciPy gammaincinv: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gammaincinv.html
- ChiSquared: https://www.wolframalpha.com/input?i=chi+squared+distribution+quantile+function
- Beta: https://www.wolframalpha.com/input?i=beta+distribution+quantile+function
- Gamma: https://www.wolframalpha.com/input?i=gamma+distribution+quantile+function
- StudentT: https://www.wolframalpha.com/input?i=student+distribution+quantile+function
Checklist
- [ ] Explain important implementation details 👆
- [ ] Make sure that the pre-commit linting/style checks pass.
- [ ] Link relevant issues (preferably in nice commit messages)
- [ ] Are the changes covered by tests and docstrings?
- [ ] Fill out the short summary sections 👇
Major / Breaking Changes
- ...
New features
- ...
Bugfixes
- ...
Documentation
- ...
Maintenance
- ...
:books: Documentation preview :books:: https://pymc--6845.org.readthedocs.build/en/6845/
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 90.97%. Comparing base (
a033261) to head (262d868).
:exclamation: Current head 262d868 differs from pull request most recent head 00c1f34. Consider uploading reports for the commit 00c1f34 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #6845 +/- ##
==========================================
- Coverage 92.29% 90.97% -1.33%
==========================================
Files 100 100
Lines 16875 16887 +12
==========================================
- Hits 15575 15363 -212
- Misses 1300 1524 +224
| Files | Coverage Δ | |
|---|---|---|
| pymc/distributions/continuous.py | 97.81% <100.00%> (+0.02%) |
:arrow_up: |
@ricardoV94 now that https://github.com/pymc-devs/pytensor/pull/502 is merged, we are able to complete this PR.
The tests are failing because betaincinv and gammaincinv cannot be imported from pytensor.tensor.math, I guess it is matter of time to pytensor to be updated. I will restart the checks when it happens. Locally all tests are passing.
Let me know what you think.
You can see which PyTensor version is being installed in the CI.
Yeah it's strange it's still installing 2.18.4, even though 2.18.5 was released already some days ago. We may need to be just a bit more patient?
Yeah it's strange it's still installing 2.18.4, even though 2.18.5 was released already some days ago. We may need to be just a bit more patient?
Right! That's what I thought, I saw the pytenor release 4 days ago and thought it was a good experience to try it now and see if the update immediately applies. No rush on it, it will work soon.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@maresb any idea why CI is not yet installing the last PyTensor? Was there an issue on the conda side?
There was a glitch in the deployment of 2.18.5 to anaconda.org.
I reran it, so hopefully it fixes things.
It uploaded successfully on anaconda.org, so I restarted the tests.
Interesting way to restart the tests! I searched for and the recommendation I found was to push an empty commit, but I think this is nicer.
You can also just ask a dev with write permissions. We can restart them from GitHub interface
Apologies, I have re-requested review by mistake.
Apologies, I have re-requested review by mistake.
No worries!
@ricardoV94 I don't think it is failing due to the changes we've made in this PR. The failed ones below seems to be unrelated with the icdf functions, let me know if I am mistaken. I've synced the branch, let's see if we still get the errors.
tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.05-non_psi_args8] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.3-non_psi_args9] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.4-non_psi_args10] FAILED [ 97%] tests/distributions/test_mixture.py::TestHurdleMixtures::test_hurdle_zero_draws[dist-0.9-non_psi_args11] FAILED [ 98%]
They could actually be related. The hurdle mixture uses truncated rvs for the non-zero part. And with these new icdfs several truncated variables will switch from rejection sampling to inverse cdf sampling in the random graph.
I restarted, but I would also suggest you check locally
Got it, thanks for the pointers. I am still investigating what is the problem.
I have spent few hours trying to debug but I still can't find what is going on.
So far, drawing from Gamma works well, but inside the Truncated it does not. The same do not happen with other functions that I tried (Normal, Beta, HalfCauchy, InverseGamma - picked here some that have icdf functions and some that does not).
It looks like the error points to this numpy function, and now I am in the quest to find how to access high and low passed to the rng for the Truncated Gamma: https://github.com/numpy/numpy/blob/448e4dabb5e7d593af7d164a335dc7e44e17ec95/numpy/random/_generator.pyx#L1081
Here is what I got until now, and I will keep you posted about progress. But If by looking at it you have any clues, let me know.
Drawing from Gamma, Beta, etc works as expected (I added some prints for debug purposes):
Drawing from Truncated only fails for Gamma:
I have an update but it is still a mystery for me that it raises the high - low overflow error when passing only lower but not when passing lower and upper. And that it only happens with Truncated(Gamma) but not with Truncated(Beta) for example.
Work in progress but let me know if you have any ideas on where to look.
I just narrowed the problem down to the following: As stated by @ricardoV94, the hurdle mixture uses truncated rvs for the non-zero part. The failing tests are breaking when passing Gamma.dist to the Truncated.dist.
- When I try to simulate the part it breaks the test with
Truncated.dist(Gamma.dist(alpha = 5, beta = 2), lower = 0.01).eval()it fails in the middle when trying to sample from a uniform distribution withUniformRV.rng_fn. It does so because the inputs for this function (rng_fn) intensor.random.op.pyis something like[array(7.64934096e-07), array(nan)]and when trying to generate a number with upper or lower as NaN, it gives the Overflow error we are observing (see this link line 1081) - When trying another distribution with icdf implemented, like Beta.distr, and run
Truncated.dist(Beta.dist(alpha = 5, beta = 2), lower = 0.01).eval()it works. In this case the arguments for the rng_fn function is something like[array(1.835008e-05), array(1.)], the number is then sampled from a uniform distribution and the function finish running. - Another experiment that went successful is when I pass lower and upper to
Truncated.dist(Gamma.dist(alpha = 5, beta = 2), lower = 0.01, upper = 1000).eval()it finishes without any problems.
I am clueless about why in Gamma the Op does not create the second argument in the case of Truncated(Gamma) but does for other Truncated distributions I have tested. I am stuck in the part where it fails when calling the functiongetattr() inside pytensor.tensor.random.op.rng_fn(), and running the debug mode in this goes to so many places that makes me lost. In summary, I did my best in trying to find where (and if) I can solve this problem within the scope of this PR, and can't proceed here without help with this.
Is the nan coming from pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf)?
Sounds like PyTensor gammainc C code does not have a special case for np.inf:
https://github.com/pymc-devs/pytensor/blob/19e1a982e5a2e9ba2bd9d708133abf102c6fb835/pytensor/scalar/c_code/gamma.c#L217-L223
In contrast this scipy has a special case? https://github.com/scipy/scipy/blob/4ecf5638c36c3a87b473056f5fffecfeddffd368/scipy/special/cephes/igam.c#L149-L151
pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf)
Indeed, this code returns nan. I tried to fix it writing the special case for np.inf in the gamma.c function but it didn't work out yet. Maybe there are other places I need to change in order to make it work.
`DEVICE double GammaP (double n, double x) { /* --- regularized Gamma function P / if ((n <= 0) || (x < 0)) return NPY_NAN; / check the function arguments / if (x <= 0) return 0; / treat x = 0 as a special case */ if (x < n+1) return _series(n, x) *exp(n *log(x) -x -logGamma(n)); if (isinf(n)) { if (isinf(x)) return NPY_NAN; return 0; } if (isinf(x)) return 1; return 1 -_cfrac(n, x) *exp(n log(x) -x -logGamma(n)); } / GammaP() */
/--------------------------------------------------------------------/
DEVICE double GammaQ (double n, double x) { /* --- regularized Gamma function Q / if ((n <= 0) || (x < 0)) return NPY_NAN; / check the function arguments / if (x <= 0) return 1; / treat x = 0 as a special case */ if (x < n+1) return 1 -_series(n, x) *exp(n *log(x) -x -logGamma(n)); if (isinf(n)) { if (isinf(x)) return NPY_NAN; return 1; } if (isinf(x)) return 0; return _cfrac(n, x) *exp(n log(x) -x -logGamma(n)); } / GammaQ() */`
Even with this, the pm.logcdf(pm.Gamma.dist(alpha=2, beta=5), np.inf) still returns nan.
You need to delete pytensor cache or bump the Op c_cache_version number. Try pytensor-cache clear or pytensor cache-clear I never remember. If clear doesn't work use purge
Thanks for the pointers! You were on point in all suggestions, the logcdf problem and the solution to it.
After changing the gamma.c function and pytensor-cache clear and pytensor-cache purge (I needed that too), the Truncated icdf worked as expected and all the text_mixture.py tests pass.
I will open an issue and a PR in pytensor to change the gamma.c code.
That PR should probably do a similar fix for the complementary gammainc function (gammaincc?)
Edit: I see you were already onto it
@amyoshino PyTensor new release should be in conda soon. We will need to update the pin on PyTensor version here, as it was a "major" release (or in a separate PR if there are other things that need fix)
https://github.com/pymc-devs/pytensor/releases/tag/rel-2.19.0
@amyoshino PyTensor new release should be in conda soon. We will need to update the pin on PyTensor version here, as it was a "major" release (or in a separate PR if there are other things that need fix)
https://github.com/pymc-devs/pytensor/releases/tag/rel-2.19.0
That's awesome!! Looking forward to it to be updated! Let me know if I need to do anything with respect to this PR as part of the process.
You need to bump the PyTensor dependency constraints on the PyMC side
Ah, I see! Alright, let me try that then! Thanks for the pointer.
Just pushed the bump dependency modifications in here. Seems like there is one test failing though, which wasn't failing when I ran the test suite locally.
@amyoshino I'm taking core of the Pytensor bump in https://github.com/pymc-devs/pymc/pull/7203