aesara
aesara copied to clipboard
Request: Add Gaussian Hypergeometric Function
Description of your problem or feature request
I'm currently rewriting the Lifetimes library to run on PyMC, but the likelihood function of one of the models requires a Gaussian Hypergeometric function be written in Aesara. This is a special case of the generalized hypergeometric function and a separate matter from the hypergeometric distribution, which Aesara already implements in HyperGeometricRV
.
The SciPy implementation is written in cython and I'm not quite sure how to go about brute-forcing this in aesara.tensor
, but given enough assistance I may be able to whip something up for a future PR.
Most of our scipy.special
implementations are in aesara.scalar.math
, so that module contains a lot of copy-paste-able examples for implementing an Op
for scipy.special.hyp2f1
.
Once a ScalarOp
variant is present, one only need add the following to aesara.tensor.math
in order to make it available in tensor form:
@scalar_elemwise
def hyp2f1(a, b, c, z):
"""Gauss hypergeometric function ``2F1(a, b; c; z)``."""
...
__all__ = [
...
"hyp2f1",
]
The C-backend implementation might be able to call the C/Cython code exposed by SciPy directly, too, which could considerably simplify the implementation.
Thanks! I don't see a lot of documentation at this time for working with Cython code, but the ScalarOp
implementation seems more straightforward than anticipated, so I'll proceed with that approach for now and post any issues I encounter here before creating a PR.
Ok I've forked Aesara and started working on this in a new branch, but I want to ensure all the tests are passing before creating a PR. Here's what I have so far.
In aesara.scalar.math
:
class Hyp2F1(ScalarOp):
"""
Gaussian hypergeometric function ``2F1(a, b; c; z)``.
"""
nin = 4
nfunc_spec = ("scipy.special.hyp2f1", 4, 1)
@staticmethod
def st_impl(a, b, c, z):
return scipy.special.hyp2f1(a, b, c, z)
def impl(self, a, b, c, z):
return Hyp2F1.st_impl(a, b, c, z)
def grad(self, inp, grads):
a, b, c, z = inputs
(gz,) = grads
return [
gz * hyp2f1_der_wrt_a,
gz * hyp2f1_der_wrt_b,
gz * hyp2f1_der_wrt_c,
gz * hyp2f1_der_wrt_z,
]
def c_code(self, *args, **kwargs):
raise NotImplementedError()
hyp2f1 = Hyp2F1(upgrade_to_float, name="hyp2f1")
In aesara.tensor.math
:
@scalar_elemwise
def hyp2f1(a, b, c, z):
"""gaussian hypergeometric function"""
In aesara.tensor.inplace
:
@scalar_elemwise
def hyp2f1_inplace(a, b, c, z):
"""gaussian hypergeometric function"""
In tests.tensor.test_math_scipy
:
_good_broadcast_quaternary_hyp2f1 = dict(
normal=(
random_ranged(0, 1000, (2, 3)),
random_ranged(0, 1000, (2, 3)),
random_ranged(0, 1, (2, 3)),
random_ranged(0, 1, (2, 3)),
),
)
_grad_broadcast_quaternary_hyp2f1 = dict(
normal=(
random_ranged(0, 1000, (2, 3)),
random_ranged(0, 1000, (2, 3)),
random_ranged(0, 1, (2, 3)),
random_ranged(0, 1, (2, 3)),
),
)
TestHyp2F1Broadcast = makeBroadcastTester(
op=at.hyp2f1,
expected=expected_hyp2f1,
good=_good_broadcast_quaternary_hyp2f1,
grad=_grad_broadcast_quaternary_hyp2f1,
eps=2e-10,
mode=mode_no_scipy,
)
TestHyp2F1InplaceBroadcast = makeBroadcastTester(
op=inplace.hyp2f1_inplace,
expected=expected_hyp2f1,
good=_good_broadcast_quaternary_hyp2f1,
grad=_grad_broadcast_quaternary_hyp2f1,
inplace=True,
)
All 4 (of 10) failing tests are due to these reasons, which are the same for *Broadcast
and *InplaceBroadcast
:
test_good - AssertionError: Test Elemwise{hyp2f1_inplace,inplace}. . .
test_grad - IndexError: ('list index out of range', 'Test Elemwise{hyp2f1_inpl. . .
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.hyp2f1.html https://github.com/scipy/scipy/blob/main/scipy/special/tests/test_hyp2f1.py
The grad looks wrong. For the broadcasting test, could it be that the scipy function is not vectorized across all inputs (i.e., some of them must always be scalar)?
I've done some digging and found the derivations for this function:
https://functions.wolfram.com/HypergeometricFunctions/Hypergeometric2F1/20/01/
I've modified my previous post with some pseudocode for the grad
method. However, these derivations seem to vary based on the value of the z
input, which could also be throwing off the tests. They also all require the Pochhammer symbol which I suppose I'll need to create an Op
for as well. Fortunately those gradients are more straightforward:
https://functions.wolfram.com/GammaBetaErf/Pochhammer/20/01/
You can specify that the gradients (wrt to some/all inputs) are not implemented. That's totally fine if you don't need them/ require too much work right now. Of course it's always better to have them!
Here is one such example: https://github.com/aesara-devs/aesara/blob/e40c827462ff2956010794ac94a38e70ae3a3131/aesara/scalar/math.py#L932-L938
Thanks. After looking into the likelihood function for my specific use case, gradients only need to be implemented for |z|< 1
:
(See expressions 19 and 20) https://www.brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
Implementing gradients for |z| >= 1
will require creating another Op
for the HypergeometricPFQ function.
Also, these derivatives require summing over infinity. I've found a few suggestions on how to go about this on Stack Overflow, but surely this is already being done for an Op
elsewhere?
Thanks. After looking into the likelihood function for my specific use case, gradients only need to be implemented for
|z|< 1
:(See expressions 19 and 20) https://www.brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
Implementing gradients for
|z| >= 1
will require creating anotherOp
for the HypergeometricPFQ function.Also, these derivatives require summing over infinity. I've found a few suggestions on how to go about this on Stack Overflow, but surely this is already being done for an
Op
elsewhere?
Yeah, we don't need to implement every case right away; just enough for the reasonable and more or less anticipated cases.
This is almost ready for a PR. The only test failing now for TestHyp2F1Broadcast
and TestHyp2F1InplaceBroadcast
is test_grad
.
I'm using the mpmath
library (a transitive dependency of Aesara) for the infinite summations required of these derivatives, but it seems to have its own proprietary data type that Aesara doesn't like:
TypeError: ('float() argument must be a string or a number, not \'mpc\'\nApply node that caused the error: Elemwise{hyp2f1_der}
I've alternatively seen Variable
in place of mpc
. I also tried finite summations via scipy
but I'm encountering Run Overflows before the derivatives converge to the correct values. Here's what the mpmath
implementation looks like:
import mpmath as mp
def impl(self, a, b, c, z, wrt):
def _hyp2f1_da(a, b, c, z):
"""
Derivative of hyp2f1 wrt a
"""
if abs(z) >= 1:
return grad_not_implemented(self, 0, a)
else:
term1 = mp.nsum(
lambda k: (mp.rf(a, k) * mp.rf(b, k) * mp.digamma(a + k) * (z**k))
/ (mp.rf(c, k) * mp.fac(k)),
[0, mp.mpf("inf")],
)
term2 = mp.digamma(a) * mp.hyp2f1(a, b, c, z)
return term1 - term2
I tried casting these term variables into floats, but aesara still finds these mpmath
functions disagreeable.
If you wanted to use mpmath, you would need to wrap that in an Op itself. Grad must return Aesara expressions.
Have a look at the gradient of betainc which wraps a vanilla Python expression.
Sorry for not providing a more complete code example. These methods are already being wrapped in an Op:
class Hyp2F1Der(ScalarOp):
"""
Derivatives of the Gaussian hypergeometric function ``2F1(a, b; c; z)``.
"""
nin = 5
def impl(self, a, b, c, z, wrt):
def _hyp2f1_da(a, b, c, z):
"""
Derivative of hyp2f1 wrt a
"""
......
if wrt == 0:
return _hyp2f1_da(a, b, c, z)
elif wrt == 1:
.....
def c_code(self, *args, **kwargs):
raise NotImplementedError()
hyp2f1_der = Hyp2F1Der(upgrade_to_float, name="hyp2f1_der")
Does this need to be included in aesara.tensor.math
or elsewhere? I didn't see betainc_der in that module.
Perhaps could you share a gist with all the code?
I haven't looked careful, the thing that jumped to my attention is that grad_not_implemented
. You shouldn't return it in the perform/impl method. In that case it should be a vanilla NotImplementedError
.
Perhaps could you share a gist with all the code?
Better yet, @ColtAllen, feel free to create a draft PR. That's much better suited for this level of detail/discussion (i.e. the kind that involves prototype implementations).
Regarding the gradient of hyp2f1
wrt. a
(or anything else), the options are usually as follows:
- find a representation of the gradient in terms of existing
Op
s, - use a representation that requires a few missing
Op
s and add those, - create a single
Op
that represents the entire gradient computation and return that in theOp.grad
.
Option 1. is usually the best, but sometimes representations that use existing Op
s don't exist—which could be the case here—and/or they don't apply to the desired inputs or ranges of values.
Option 2. is good to try next, but it looks like the representation you're considering involves an infinite sum that would need to be approximated at the graph-level by a Scan
(i.e. looping) node. The underlying computations would be more or less equivalent to the converging for
-loop infinite sum approximations one would write in Python/C. (N.B. these are likely the same iterations used by mpmath
at some point, just in a somewhat more costly infinite precision setting.)
When the graphs required to compute such quantities are extremely complicated and/or an efficient numerical implementation already exists in Python/C, Option 3. is worth considering.
Just so you know, we're reluctant to include "composite" Op
s within Aesara itself, since they implicitly undermine our design and optimization efforts; however, we may do so after careful consideration. If you're interested in getting something at least as performant as a known Python/C implementation running sooner than later, then a custom gradient Op
employed by a custom hyp2f1
Op
is likely the best approach.
Awesome; if I can approximate an infinite sum via Scan
, then I could write this entirely in terms of existing Op
s. What would that approximation look like? Here's my first stab at it:
components, updates = aesara.scan(fn=lambda k: rf(a, k) * rf(b, k) * digamma(a + k) * (z**k)) / (rf(c, k) * fac(k))),
outputs_info=None,
sequences=[0, aesara.tensor.arange(inf)],
non_sequences=[a,b,c,z])
gradients= components.sum()
I would also need to add Op
s for the rising factorials and factorials, but those are rather straightforward.
What would that approximation look like? Here's my first stab at it:
components, updates = aesara.scan(fn=lambda k: rf(a, k) * rf(b, k) * digamma(a + k) * (z**k)) / (rf(c, k) * fac(k))), outputs_info=None, sequences=[0, aesara.tensor.arange(inf)], non_sequences=[a,b,c,z]) gradients= components.sum()
Just like Python/C/etc., a loop with a convergence-based termination criteria is needed. In Aesara, conditions can be added to a Scan
via aesara.scan.utils.until
. See here.
Note that you cannot use Scan for the gradient of an Elemwise at the moment: https://github.com/aesara-devs/aesara/issues/512
That's why we have a wrapped Python loop for the gradient of the betainc, instead of a Aesara scan.
Note that you cannot use Scan for the gradient of an Elemwise at the moment: #512
Ah, yeah, we need to get that out of the way.
@brandonwillard Draft PR has been created.