aesara icon indicating copy to clipboard operation
aesara copied to clipboard

Request: Add Gaussian Hypergeometric Function

Open ColtAllen opened this issue 1 year ago • 19 comments

Description of your problem or feature request

I'm currently rewriting the Lifetimes library to run on PyMC, but the likelihood function of one of the models requires a Gaussian Hypergeometric function be written in Aesara. This is a special case of the generalized hypergeometric function and a separate matter from the hypergeometric distribution, which Aesara already implements in HyperGeometricRV.

The SciPy implementation is written in cython and I'm not quite sure how to go about brute-forcing this in aesara.tensor, but given enough assistance I may be able to whip something up for a future PR.

ColtAllen avatar Jul 11 '22 14:07 ColtAllen

Most of our scipy.special implementations are in aesara.scalar.math, so that module contains a lot of copy-paste-able examples for implementing an Op for scipy.special.hyp2f1.

Once a ScalarOp variant is present, one only need add the following to aesara.tensor.math in order to make it available in tensor form:

@scalar_elemwise
def hyp2f1(a, b, c, z):
    """Gauss hypergeometric function ``2F1(a, b; c; z)``."""

...

__all__ = [
    ...
    "hyp2f1",
]

The C-backend implementation might be able to call the C/Cython code exposed by SciPy directly, too, which could considerably simplify the implementation.

brandonwillard avatar Jul 11 '22 20:07 brandonwillard

Thanks! I don't see a lot of documentation at this time for working with Cython code, but the ScalarOp implementation seems more straightforward than anticipated, so I'll proceed with that approach for now and post any issues I encounter here before creating a PR.

ColtAllen avatar Jul 17 '22 23:07 ColtAllen

Ok I've forked Aesara and started working on this in a new branch, but I want to ensure all the tests are passing before creating a PR. Here's what I have so far.

In aesara.scalar.math:

class Hyp2F1(ScalarOp):
    """
    Gaussian hypergeometric function ``2F1(a, b; c; z)``.

    """

    nin = 4
    nfunc_spec = ("scipy.special.hyp2f1", 4, 1)

    @staticmethod
    def st_impl(a, b, c, z):
        return scipy.special.hyp2f1(a, b, c, z)

    def impl(self, a, b, c, z):
        return Hyp2F1.st_impl(a, b, c, z)

    def grad(self, inp, grads):
        a, b, c, z = inputs
        (gz,) = grads
        return [
            gz * hyp2f1_der_wrt_a,
            gz * hyp2f1_der_wrt_b,
            gz * hyp2f1_der_wrt_c,
            gz * hyp2f1_der_wrt_z,
        ]

    def c_code(self, *args, **kwargs):
        raise NotImplementedError()


hyp2f1 = Hyp2F1(upgrade_to_float, name="hyp2f1")

In aesara.tensor.math:

@scalar_elemwise
def hyp2f1(a, b, c, z):
    """gaussian hypergeometric function"""

In aesara.tensor.inplace:

@scalar_elemwise
def hyp2f1_inplace(a, b, c, z):
    """gaussian hypergeometric function"""

In tests.tensor.test_math_scipy:

_good_broadcast_quaternary_hyp2f1 = dict(
    normal=(
        random_ranged(0, 1000, (2, 3)),
        random_ranged(0, 1000, (2, 3)),
        random_ranged(0, 1, (2, 3)),
        random_ranged(0, 1, (2, 3)),
    ),
)

_grad_broadcast_quaternary_hyp2f1 = dict(
    normal=(
        random_ranged(0, 1000, (2, 3)),
        random_ranged(0, 1000, (2, 3)),
        random_ranged(0, 1, (2, 3)),
        random_ranged(0, 1, (2, 3)),
    ),
)

TestHyp2F1Broadcast = makeBroadcastTester(
    op=at.hyp2f1,
    expected=expected_hyp2f1,
    good=_good_broadcast_quaternary_hyp2f1,
    grad=_grad_broadcast_quaternary_hyp2f1,
    eps=2e-10,
    mode=mode_no_scipy,
)

TestHyp2F1InplaceBroadcast = makeBroadcastTester(
    op=inplace.hyp2f1_inplace,
    expected=expected_hyp2f1,
    good=_good_broadcast_quaternary_hyp2f1,
    grad=_grad_broadcast_quaternary_hyp2f1,
    inplace=True,
)

All 4 (of 10) failing tests are due to these reasons, which are the same for *Broadcast and *InplaceBroadcast:

test_good - AssertionError: Test Elemwise{hyp2f1_inplace,inplace}. . .
test_grad - IndexError: ('list index out of range', 'Test Elemwise{hyp2f1_inpl. . .

https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.hyp2f1.html https://github.com/scipy/scipy/blob/main/scipy/special/tests/test_hyp2f1.py

ColtAllen avatar Sep 03 '22 21:09 ColtAllen

The grad looks wrong. For the broadcasting test, could it be that the scipy function is not vectorized across all inputs (i.e., some of them must always be scalar)?

ricardoV94 avatar Sep 04 '22 06:09 ricardoV94

I've done some digging and found the derivations for this function:

https://functions.wolfram.com/HypergeometricFunctions/Hypergeometric2F1/20/01/

I've modified my previous post with some pseudocode for the grad method. However, these derivations seem to vary based on the value of the z input, which could also be throwing off the tests. They also all require the Pochhammer symbol which I suppose I'll need to create an Op for as well. Fortunately those gradients are more straightforward:

https://functions.wolfram.com/GammaBetaErf/Pochhammer/20/01/

ColtAllen avatar Sep 04 '22 14:09 ColtAllen

You can specify that the gradients (wrt to some/all inputs) are not implemented. That's totally fine if you don't need them/ require too much work right now. Of course it's always better to have them!

ricardoV94 avatar Sep 04 '22 14:09 ricardoV94

Here is one such example: https://github.com/aesara-devs/aesara/blob/e40c827462ff2956010794ac94a38e70ae3a3131/aesara/scalar/math.py#L932-L938

ricardoV94 avatar Sep 04 '22 14:09 ricardoV94

Thanks. After looking into the likelihood function for my specific use case, gradients only need to be implemented for |z|< 1:

(See expressions 19 and 20) https://www.brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf

Implementing gradients for |z| >= 1 will require creating another Op for the HypergeometricPFQ function.

Also, these derivatives require summing over infinity. I've found a few suggestions on how to go about this on Stack Overflow, but surely this is already being done for an Op elsewhere?

ColtAllen avatar Sep 04 '22 14:09 ColtAllen

Thanks. After looking into the likelihood function for my specific use case, gradients only need to be implemented for |z|< 1:

(See expressions 19 and 20) https://www.brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf

Implementing gradients for |z| >= 1 will require creating another Op for the HypergeometricPFQ function.

Also, these derivatives require summing over infinity. I've found a few suggestions on how to go about this on Stack Overflow, but surely this is already being done for an Op elsewhere?

Yeah, we don't need to implement every case right away; just enough for the reasonable and more or less anticipated cases.

brandonwillard avatar Sep 04 '22 19:09 brandonwillard

This is almost ready for a PR. The only test failing now for TestHyp2F1Broadcast and TestHyp2F1InplaceBroadcast is test_grad.

I'm using the mpmath library (a transitive dependency of Aesara) for the infinite summations required of these derivatives, but it seems to have its own proprietary data type that Aesara doesn't like:

TypeError: ('float() argument must be a string or a number, not \'mpc\'\nApply node that caused the error: Elemwise{hyp2f1_der}

I've alternatively seen Variable in place of mpc. I also tried finite summations via scipy but I'm encountering Run Overflows before the derivatives converge to the correct values. Here's what the mpmath implementation looks like:

import mpmath as mp

    def impl(self, a, b, c, z, wrt):
        def _hyp2f1_da(a, b, c, z):
            """
            Derivative of hyp2f1 wrt a
            """

            if abs(z) >= 1:
                return grad_not_implemented(self, 0, a)

            else:

                term1 = mp.nsum(
                    lambda k: (mp.rf(a, k) * mp.rf(b, k) * mp.digamma(a + k) * (z**k))
                    / (mp.rf(c, k) * mp.fac(k)),
                    [0, mp.mpf("inf")],
                )
                term2 = mp.digamma(a) * mp.hyp2f1(a, b, c, z)

                return term1 - term2

I tried casting these term variables into floats, but aesara still finds these mpmath functions disagreeable.

ColtAllen avatar Sep 11 '22 21:09 ColtAllen

If you wanted to use mpmath, you would need to wrap that in an Op itself. Grad must return Aesara expressions.

Have a look at the gradient of betainc which wraps a vanilla Python expression.

ricardoV94 avatar Sep 12 '22 05:09 ricardoV94

Sorry for not providing a more complete code example. These methods are already being wrapped in an Op:

class Hyp2F1Der(ScalarOp):
    """
    Derivatives of the Gaussian hypergeometric function ``2F1(a, b; c; z)``.
    """

    nin = 5

    def impl(self, a, b, c, z, wrt):
        def _hyp2f1_da(a, b, c, z):
            """
            Derivative of hyp2f1 wrt a
            """
            ......
    
    if wrt == 0:
            return _hyp2f1_da(a, b, c, z)
    elif wrt == 1:
            .....

    def c_code(self, *args, **kwargs):
        raise NotImplementedError()


hyp2f1_der = Hyp2F1Der(upgrade_to_float, name="hyp2f1_der")

Does this need to be included in aesara.tensor.math or elsewhere? I didn't see betainc_der in that module.

ColtAllen avatar Sep 12 '22 15:09 ColtAllen

Perhaps could you share a gist with all the code?

I haven't looked careful, the thing that jumped to my attention is that grad_not_implemented. You shouldn't return it in the perform/impl method. In that case it should be a vanilla NotImplementedError.

ricardoV94 avatar Sep 12 '22 15:09 ricardoV94

Perhaps could you share a gist with all the code?

Better yet, @ColtAllen, feel free to create a draft PR. That's much better suited for this level of detail/discussion (i.e. the kind that involves prototype implementations).

brandonwillard avatar Sep 12 '22 23:09 brandonwillard

Regarding the gradient of hyp2f1 wrt. a (or anything else), the options are usually as follows:

  1. find a representation of the gradient in terms of existing Ops,
  2. use a representation that requires a few missing Ops and add those,
  3. create a single Op that represents the entire gradient computation and return that in the Op.grad.

Option 1. is usually the best, but sometimes representations that use existing Ops don't exist—which could be the case here—and/or they don't apply to the desired inputs or ranges of values.

Option 2. is good to try next, but it looks like the representation you're considering involves an infinite sum that would need to be approximated at the graph-level by a Scan (i.e. looping) node. The underlying computations would be more or less equivalent to the converging for-loop infinite sum approximations one would write in Python/C. (N.B. these are likely the same iterations used by mpmath at some point, just in a somewhat more costly infinite precision setting.)

When the graphs required to compute such quantities are extremely complicated and/or an efficient numerical implementation already exists in Python/C, Option 3. is worth considering.

Just so you know, we're reluctant to include "composite" Ops within Aesara itself, since they implicitly undermine our design and optimization efforts; however, we may do so after careful consideration. If you're interested in getting something at least as performant as a known Python/C implementation running sooner than later, then a custom gradient Op employed by a custom hyp2f1 Op is likely the best approach.

brandonwillard avatar Sep 13 '22 00:09 brandonwillard

Awesome; if I can approximate an infinite sum via Scan, then I could write this entirely in terms of existing Ops. What would that approximation look like? Here's my first stab at it:

components, updates = aesara.scan(fn=lambda k: rf(a, k) * rf(b, k) * digamma(a + k) * (z**k)) / (rf(c, k) * fac(k))),
                                  outputs_info=None,
                                  sequences=[0, aesara.tensor.arange(inf)],
                                  non_sequences=[a,b,c,z])
gradients= components.sum()

I would also need to add Ops for the rising factorials and factorials, but those are rather straightforward.

ColtAllen avatar Sep 13 '22 15:09 ColtAllen

What would that approximation look like? Here's my first stab at it:

components, updates = aesara.scan(fn=lambda k: rf(a, k) * rf(b, k) * digamma(a + k) * (z**k)) / (rf(c, k) * fac(k))),
                                  outputs_info=None,
                                  sequences=[0, aesara.tensor.arange(inf)],
                                  non_sequences=[a,b,c,z])
gradients= components.sum()

Just like Python/C/etc., a loop with a convergence-based termination criteria is needed. In Aesara, conditions can be added to a Scan via aesara.scan.utils.until. See here.

brandonwillard avatar Sep 13 '22 15:09 brandonwillard

Note that you cannot use Scan for the gradient of an Elemwise at the moment: https://github.com/aesara-devs/aesara/issues/512

That's why we have a wrapped Python loop for the gradient of the betainc, instead of a Aesara scan.

ricardoV94 avatar Sep 13 '22 16:09 ricardoV94

Note that you cannot use Scan for the gradient of an Elemwise at the moment: #512

Ah, yeah, we need to get that out of the way.

brandonwillard avatar Sep 13 '22 16:09 brandonwillard

@brandonwillard Draft PR has been created.

ColtAllen avatar Nov 09 '22 18:11 ColtAllen

@brandonwillard Draft PR has been created.

Awesome; thanks!

brandonwillard avatar Nov 09 '22 19:11 brandonwillard