StatsFuns.jl
StatsFuns.jl copied to clipboard
Native implementations for distrs
Making this issue to track the progress in implementing the code for distrs natively
- beta
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- binom
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- chisq
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- fdist
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- gamma
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- hyper
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- nbeta
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- nbinom
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- nchisq
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- nfdist
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- norm
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [x] xinvlogcdf,
- [x] xinvlogccdf
- ntdist
- [ ] xpdf,
- [ ] xlogpdf,
- [ ] xcdf,
- [ ] xccdf,
- [ ] xlogcdf,
- [ ] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- pois
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [ ] xinvcdf,
- [ ] xinvccdf,
- [ ] xinvlogcdf,
- [ ] xinvlogccdf
- tdist
- [x] xpdf,
- [x] xlogpdf,
- [x] xcdf,
- [x] xccdf,
- [x] xlogcdf,
- [x] xlogccdf,
- [x] xinvcdf,
- [x] xinvccdf,
- [x] xinvlogcdf,
- [x] xinvlogccdf
In conversion from libRmath
how should invalid parameter values be handled? The libRmath
code returns NaN
in most of these cases. Julia code could do the same or could throw an ArgumentError
, which I think is the better choice.
I am team @assert cond "Condition violated: cond"
.
For example, I would write
function poislogpdf(λ::T, x::T) where {T <: Real}
λ < 0 && throw(ArgumentError("λ = $λ should be non-negative"))
isinteger(x) && 0 ≤ x || return T(-Inf)
iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
xlogy(x, λ) - λ - lgamma(x + 1)
end
Is the change in behavior justified?
@Nosferican I think of an assertion as stating a condition that must be true if the algorithm/code is functioning properly. Passing an out-of-bounds parameter value is, to me, pretty much the definition of an ArgumentError
. It is not that a condition that should not occur has occurred. It is just an invalid value for the argument.
Also, a programmer may wish to catch the error and take appropriate action. I don't think you can catch an assertion.
DomainError
would be even more appropriate for these cases.
@nalimilan Yes, of course. Thank you.
function poispdf(λ::Real, k::Real)
if λ < zero(λ)
throw(DomainError("Invalid support: λ = $λ should be non-negative real number."))
elseif !isinteger(k)
return zero(λ)
elseif iszero(λ)
if iszero(k)
return one(λ)
else
return zero(λ)
end
else
return λ^k * exp(-λ) / factorial(k) # Replaced with refined implementation.
end
end
However, doing a quick run, I noticed DomainError
does not print the informative message :( at least in Julia 0.6. Is that the desired behavior? I believe that will change in Julia 0.7 through Julia PR 22751
@Nosferican I disagree with throwing a DomainError
for non-integer x
. To me there is not a problem with returning 0
(or T(-Inf)
for poislogpdf
) for all x
except the non-negative integers.
I guess it is a matter of how closely you want to follow measure-theoretic probability and the idea that a probability mass function is a density with respect to counting measure. Because we tend to promote types I think of the pdf
functions as being defined for all real x
.
By the way, in general we should evaluate probabilities on the logarithmic scale first then, if desired, transform back to the probability or probability density scale. And you definitely don't want to call factorial(x)
because it overflows for even moderate values of x
julia> factorial(21)
ERROR: OverflowError()
Stacktrace:
[1] factorial_lookup(::Int64, ::Array{Int64,1}, ::Int64) at ./combinatorics.jl:31
[2] factorial(::Int64) at ./combinatorics.jl:39
I think the general approach should be to use promote
to collect arguments as the same type then evaluate logpdf
then, if desired, exponentiate to pdf
. Like
poispdf(λ::Real, x::Real) = exp(poislogpdf(λ, x))
function poislogpdf(λ::T, x::T) where {T <: Real}
λ < 0 && throw(DomainError("λ = $λ must be non-negative"))
isinteger(x) && 0 ≤ x || return T(-Inf)
iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
xlogy(x, λ) - λ - lgamma(x + 1)
end
poislogpdf(λ::Number, x::Number) = poislogpdf(promote(float(λ), x)...)
I agree. I was thinking of dispatching on whether the arguments were in the support or not à la what I believe Distributions.jl
does (this could be handled with obtaining a DomainError
which leads to zero
). The factorial was just a dummy for sake of completion, the idea was to determine how to handle the invalid arguments. If promote
leads to more efficient implementations and is feasible I am all for it. As for first computing the logpdf
, again if it leads to better implementations I all for it as well.
@Nosferican I see that you are correct about DomainError
not taking an explanatory string. In that case I think it may be better to use ArgumentErrror
. @nalimilan what do you think? The options are
julia> poislogpdf(-1.f0, 1.f0)
ERROR: DomainError:
Stacktrace:
[1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19
and
julia> poislogpdf(-1.f0, 1.f0)
ERROR: ArgumentError: λ = -1.0 must be non-negative
Stacktrace:
[1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19
That's fixed on 0.7 thanks to https://github.com/JuliaLang/julia/pull/22751. But until then ArgumentError
is reasonable too.
I'd also like to bring up the discussion on unit tests. Current tests will not hold up. We could at first compare our implementation to the Rmath one but eventually I think we need to write our own tests to get an understanding of the function accuracy and precision and inform users of the same.
Another thing to keep in mind when typing the function is that it would be nice for them to be compatible with ForwardDiff and such, which I think means T <: Real
.
This seems too restrictive for example:
https://github.com/JuliaStats/StatsFuns.jl/pull/33/files#diff-eb7a8b837bc7832e705af52fca59cc7bR25
@Ellipse0934 Keep in mind that the code in src/distrs/*
is temporary. At least I believe this is the case - @simonbyrne or @andreasnoack or @ararslan are more authoritative voices on this.
I think that once a complete set of libRmath-free "d-p-q-r" functions for a distribution is available, the implementation should be moved to the Distributions
package the the distr*pdf
and distr*cdf
functions in this package deprecated. See #42 for example.
Thus extensive unit tests on coverage and accuracy should be part of the Distributions
tests. Another type of test, checking the native Julia implementations versus libRmath
implementations could, I think, go into a separate package that requires Rmath
and Distributions
.
Testing a native implementation versus a libRmath implementation in StatsFuns
is awkward because the libRmath implementation has a rather general method signature. They tend to be a bit greedy, with unintended consequences. For example, one of the best ways of checking accuracy on, say, a Float64
method is to use a generic method with BigFloat
then convert the result to Float64
. Except this doesn't work in many cases because the libRmath
version grabs the BigFloat
argument and converts it to Float64
before you even get started. The tests/generic.jl
file uses ForwardDiff.Dual
to get around this but that approach has its own problems. For one thing, it introduces a spurious test dependency on the ForwardDiff
package.
Any updates on this?
Currently the fallback implementation of beta pdf has different behavior than Rmath (returning NaN instead of -Inf for out of domain values) which causes frustrations in some of my code.
There are commits in the promotions
branch which address this and fix it, but I'm worried its just going to sit in this branch.
With #113 we are now pretty far here. What remains is mostly the [x]invlog(c)cdf
definitions. We'd probably need versions of the inverse log regularized incomplete gamma and beta functions. Maybe they belong in SpecialFunctions.
Could someone post an updated list? EDIT: I'll do it
- [ ] beta 8/10 (Didonato and Morris 1992)
- [ ] binom 6/10
- [x] chisq
- [x] fdist
- [ ] gamma 8/10
- [ ] hyper 0/10
- [ ] nbeta 0/10
- [ ] nbinom 0/10
- [ ] nchisq 0/10
- [ ] nfdist 0/10
- [x] norm
- [ ] ntdist 0/10
- [ ] pois 6/10
- [ ] srdist 0/8
- [x] tdist (via fdist, ndist)