SpecialFunctions.jl icon indicating copy to clipboard operation
SpecialFunctions.jl copied to clipboard

beta_inc in the degenerate cases

Open andreasnoack opened this issue 3 years ago • 4 comments

So we allow one (but not both) of the parameters in beta_inc to be zero https://github.com/JuliaMath/SpecialFunctions.jl/blob/d829f0ac1ebab6d339bf8bada710bde50735ee7d/src/beta_inc.jl#L741-L742 but in that case the integrated density degenerate.

In the original version from the NSWC library they have the following error modes

!        ierr = 1  if a or b is negative
!        ierr = 2  if a = b = 0
!        ierr = 3  if x < 0 or x > 1
!        ierr = 4  if y < 0 or y > 1
!        ierr = 5  if x + y /= 1
!        ierr = 6  if x = a = 0
!        ierr = 7  if y = b = 0

but we don't implement mode 6 and 7. Instead, we unconditionally have https://github.com/JuliaMath/SpecialFunctions.jl/blob/d829f0ac1ebab6d339bf8bada710bde50735ee7d/src/beta_inc.jl#L754-L755. @simonbyrne was that a deliberate decision? Should we error out similarly to the original version? The current behavior

julia> beta_inc(0.0, 2.0, 0.0)
(0.0, 1.0)

is a bit odd from a CDF perspective.

andreasnoack avatar Feb 04 '22 15:02 andreasnoack

I agree: in the case of a beta(a,b) distributions, I think the idea is that if a == 0 && b > 0, then it is concentrated at 0, if you have a > 0 && b == 0 then it is concentrated at 1.

Since the convention for CDF is that the interval should be closed, I think you want something like:

    if a == 0.0 || y == 0.0
        return (1.0, 0.0)
    elseif b == 0.0 || x == 0.0
        return (0.0, 1.0)
    end

simonbyrne avatar Feb 04 '22 23:02 simonbyrne

So we allow one (but not both) of the parameters in beta_inc to be zero but in that case the integrated density degenerate.

Isn't the case where both a and b are zero undefined? In this case it's unclear from the two numbers alone how the limits should be taken: e.g., if you would consider the limit for a sequence with (a_n, b_n) where a_n = 0 and b_n -> 0, then for all n the beta distribution would be concentrated at 0, whereas e.g. if you consider a sequence with a_n -> 0 and b_n = 0, then for all n the beta distribution would be concentrated at 1. Correspondingly, Mathematica returns

> wolframscript -c "BetaRegularized[0.5, 0, 0]"  
Indeterminate

So I think it is reasonable to throw an error or return NaN if a = b = 0.

If a = 0 and b > 0, or a > 0 and b = 0, then we don't run into the same issues AFAICT. Regardless of the sequence we consider, we always converge to the degenerate case at 0, or 1, respectively. IMO it would be reasonable to return

beta_inc(0, 1, 0) = (1.0, 0.0) # currently (0.0, 1.0) and hence handled separately in the StatsFuns PR

This would also be consistent with Mathematica:

> wolframscript -c "BetaRegularized[0, 0, 1]"  
1

[Interestingly, Mathematica returns

> wolframscript -c "BetaRegularized[1, 1, 0]"
0

where we return

julia> beta_inc(1, 0, 1)
(1.0, 0.0)

which seems reasonable IMO.]

devmotion avatar Feb 08 '22 13:02 devmotion

Isn't the case where both a and b are zero undefined?

I think so but I might have formulated it poorly. It wasn't that case I was interested in. I'm interested in discussing the two cases that cause the errors 6 and 7 in the Fortran code, i.e. x = a = 0 and y = b = 0. For those two cases we deviate from the Fortran version by not erroring.

The issue with the two cases is that they can both return (1,0) as well as (0,1) depending on which of x (y) and a (b) that go to zero first, i.e. for any fixed a > 0 and b > 0 you'd get (0, 1) as x -> 0. But for fixed x > 0 and b > 0 you'd get (1, 0) as a -> 0. Likewise for (y,b).

I think I know which of the limits I'd prefer if beta_inc is just the Beta CDF but I guess that there might be other applications of the function where it isn't clear which of the limits to prefer.

andreasnoack avatar Feb 08 '22 21:02 andreasnoack

I think I know which of the limits I'd prefer if beta_inc is just the Beta CDF but I guess that there might be other applications of the function where it isn't clear which of the limits to prefer.

That's a good point. In that case, perhaps we should error, and handle the behaviour in Distributions.jl

simonbyrne avatar Feb 09 '22 05:02 simonbyrne