StatsFuns.jl Native implementations for distrs

Making this issue to track the progress in implementing the code for distrs natively

beta
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
binom
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
chisq
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
fdist
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
gamma
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
hyper
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
nbeta
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
nbinom
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
nchisq
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
nfdist
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
norm
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [x] xinvlogcdf,
- [x] xinvlogccdf
ntdist
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
pois
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
tdist
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [x] xinvlogcdf,
- [x] xinvlogccdf

Mar 13 '18 21:03 Nosferican

In conversion from libRmath how should invalid parameter values be handled? The libRmath code returns NaN in most of these cases. Julia code could do the same or could throw an ArgumentError, which I think is the better choice.

Mar 14 '18 18:03 dmbates

I am team @assert cond "Condition violated: cond".

Mar 14 '18 18:03 Nosferican

For example, I would write

function poislogpdf(λ::T, x::T) where {T <: Real}
    λ < 0 && throw(ArgumentError("λ = $λ should be non-negative"))
    isinteger(x) && 0 ≤ x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end

Is the change in behavior justified?

Mar 14 '18 18:03 dmbates

@Nosferican I think of an assertion as stating a condition that must be true if the algorithm/code is functioning properly. Passing an out-of-bounds parameter value is, to me, pretty much the definition of an ArgumentError. It is not that a condition that should not occur has occurred. It is just an invalid value for the argument.

Also, a programmer may wish to catch the error and take appropriate action. I don't think you can catch an assertion.

Mar 14 '18 18:03 dmbates

DomainError would be even more appropriate for these cases.

Mar 14 '18 18:03 nalimilan

@nalimilan Yes, of course. Thank you.

Mar 14 '18 18:03 dmbates

function poispdf(λ::Real, k::Real)
    if λ < zero(λ)
        throw(DomainError("Invalid support: λ = $λ should be non-negative real number."))
    elseif !isinteger(k)
        return zero(λ)
    elseif iszero(λ)
        if iszero(k)
            return one(λ)
        else
            return zero(λ)
        end
    else
        return λ^k * exp(-λ) / factorial(k) # Replaced with refined implementation.
    end
end

However, doing a quick run, I noticed DomainError does not print the informative message :( at least in Julia 0.6. Is that the desired behavior? I believe that will change in Julia 0.7 through Julia PR 22751

Mar 14 '18 18:03 Nosferican

@Nosferican I disagree with throwing a DomainError for non-integer x. To me there is not a problem with returning 0 (or T(-Inf) for poislogpdf) for all x except the non-negative integers.

I guess it is a matter of how closely you want to follow measure-theoretic probability and the idea that a probability mass function is a density with respect to counting measure. Because we tend to promote types I think of the pdf functions as being defined for all real x.

By the way, in general we should evaluate probabilities on the logarithmic scale first then, if desired, transform back to the probability or probability density scale. And you definitely don't want to call factorial(x) because it overflows for even moderate values of x

julia> factorial(21)
ERROR: OverflowError()
Stacktrace:
 [1] factorial_lookup(::Int64, ::Array{Int64,1}, ::Int64) at ./combinatorics.jl:31
 [2] factorial(::Int64) at ./combinatorics.jl:39

I think the general approach should be to use promote to collect arguments as the same type then evaluate logpdf then, if desired, exponentiate to pdf. Like

poispdf(λ::Real, x::Real) = exp(poislogpdf(λ, x))

function poislogpdf(λ::T, x::T) where {T <: Real}
    λ < 0 && throw(DomainError("λ = $λ must be non-negative"))
    isinteger(x) && 0 ≤ x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end    

poislogpdf(λ::Number, x::Number) = poislogpdf(promote(float(λ), x)...)

Mar 14 '18 19:03 dmbates

I agree. I was thinking of dispatching on whether the arguments were in the support or not à la what I believe Distributions.jl does (this could be handled with obtaining a DomainError which leads to zero). The factorial was just a dummy for sake of completion, the idea was to determine how to handle the invalid arguments. If promote leads to more efficient implementations and is feasible I am all for it. As for first computing the logpdf, again if it leads to better implementations I all for it as well.

Mar 14 '18 19:03 Nosferican

@Nosferican I see that you are correct about DomainError not taking an explanatory string. In that case I think it may be better to use ArgumentErrror. @nalimilan what do you think? The options are

julia> poislogpdf(-1.f0, 1.f0)
ERROR: DomainError:
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

and

julia> poislogpdf(-1.f0, 1.f0)
ERROR: ArgumentError: λ = -1.0 must be non-negative
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

Mar 14 '18 19:03 dmbates

That's fixed on 0.7 thanks to https://github.com/JuliaLang/julia/pull/22751. But until then ArgumentError is reasonable too.

Mar 14 '18 20:03 nalimilan

I'd also like to bring up the discussion on unit tests. Current tests will not hold up. We could at first compare our implementation to the Rmath one but eventually I think we need to write our own tests to get an understanding of the function accuracy and precision and inform users of the same.

Mar 16 '18 14:03 Ellipse0934

Another thing to keep in mind when typing the function is that it would be nice for them to be compatible with ForwardDiff and such, which I think means T <: Real.

This seems too restrictive for example:

https://github.com/JuliaStats/StatsFuns.jl/pull/33/files#diff-eb7a8b837bc7832e705af52fca59cc7bR25

Mar 16 '18 15:03 jonathanBieler

@Ellipse0934 Keep in mind that the code in src/distrs/* is temporary. At least I believe this is the case - @simonbyrne or @andreasnoack or @ararslan are more authoritative voices on this.

I think that once a complete set of libRmath-free "d-p-q-r" functions for a distribution is available, the implementation should be moved to the Distributions package the the distr*pdf and distr*cdf functions in this package deprecated. See #42 for example.

Thus extensive unit tests on coverage and accuracy should be part of the Distributions tests. Another type of test, checking the native Julia implementations versus libRmath implementations could, I think, go into a separate package that requires Rmath and Distributions.

Testing a native implementation versus a libRmath implementation in StatsFuns is awkward because the libRmath implementation has a rather general method signature. They tend to be a bit greedy, with unintended consequences. For example, one of the best ways of checking accuracy on, say, a Float64 method is to use a generic method with BigFloat then convert the result to Float64. Except this doesn't work in many cases because the libRmath version grabs the BigFloat argument and converts it to Float64 before you even get started. The tests/generic.jl file uses ForwardDiff.Dual to get around this but that approach has its own problems. For one thing, it introduces a spurious test dependency on the ForwardDiff package.

Mar 16 '18 15:03 dmbates

Any updates on this?

Currently the fallback implementation of beta pdf has different behavior than Rmath (returning NaN instead of -Inf for out of domain values) which causes frustrations in some of my code.

There are commits in the promotions branch which address this and fix it, but I'm worried its just going to sit in this branch.

May 14 '18 15:05 bdeonovic

With #113 we are now pretty far here. What remains is mostly the [x]invlog(c)cdf definitions. We'd probably need versions of the inverse log regularized incomplete gamma and beta functions. Maybe they belong in SpecialFunctions.

Feb 18 '22 12:02 andreasnoack

Could someone post an updated list? EDIT: I'll do it

Sep 07 '23 14:09 simsurace

[ ] beta 8/10 (Didonato and Morris 1992)
[ ] binom 6/10
[x] chisq
[x] fdist
[ ] gamma 8/10
[ ] hyper 0/10
[ ] nbeta 0/10
[ ] nbinom 0/10
[ ] nchisq 0/10
[ ] nfdist 0/10
[x] norm
[ ] ntdist 0/10
[ ] pois 6/10
[ ] srdist 0/8
[x] tdist (via fdist, ndist)

Sep 07 '23 15:09 simsurace

StatsFuns.jl StatsFuns.jl copied to clipboard

Native implementations for distrs

StatsFuns.jl
StatsFuns.jl copied to clipboard