Distributions.jl
Distributions.jl copied to clipboard
Throw an error when truncating outside of support
Fixes #843 using option 2 as described here. The implementation checks for 0 probability within the truncated region but continues to allow coincident truncation bounds if they're within the support. That avoids breaking cases like that shown in #1712.
Looks like this approach doesn't quite do it, as evidenced by the large number of test failures 🙃
Some of them at least are due to tests that have to be fixed/removed: https://github.com/JuliaStats/Distributions.jl/actions/runs/5017513840/jobs/8995725428?pr=1723#step:6:67
Some others seem to indicate that logtp = -Inf
currently for normal distributions truncated far away from the mean. Even though some of their traits (e.g., mean and variance) are tested they are basically useless:
julia> d = truncated(Normal(0, 1), 100, 115)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=100.0, upper=115.0)
julia> mean(d)
100.00999800099926
julia> var(d)
9.994005086296622e-5
julia> pdf(d, 101)
NaN
julia> logpdf(d, 101)
Inf
julia> d.logtp
-Inf
julia> d.tp
0.0
I think we might be able to fix these existing numerical issues for normal distributions (at least) by improving the numerical accuracy in the calculation of logtp
. IMO this should be possible since
julia> SpecialFunctions.logerf(100/sqrt(2), 115/sqrt(2)) - log(2)
-5005.5242086942035
It seems one part could consist of improving logdiffcdf
for normal distributions as currently
julia> logdiffcdf(Normal(0, 1), 115, 100)
-Inf
And then maybe logdiffcdf
could be used in truncated
?
I opened #1728 to address the logdiffcdf
issue for normal distributions.
Something @devmotion wrote in #1721 reminds me that there's actually 2 separate issues that are closely related, but might require different solutions.
I think we would want to catch this case earlier in truncated: A distribution with zero mass is by definition not a distribution. See https://github.com/JuliaStats/Distributions.jl/issues/843.
I would argue that something like truncated(Normal(0,1), 0, 0)
is actually a degenerate distribution with infinite mass at a single point. On the other hand, truncated(Uniform(0,0), -2, -1)
is indeed not a distribution.
Now the question is, should we handle these cases differently?
It seems to me like this pull request would error out on randf(Normal(0,1), 0, 0)
since:
julia> logdiffcdf(Normal(0,1), 0, 0)
-Inf
Note that this currently works "as proper", in my opinion.
julia> t = truncated(Normal(0,1), 0, 0)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=0.0, upper=0.0)
julia> rand(t)
0.0
julia> t = truncated(Normal(0,1), 1, 1)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=1.0, upper=1.0)
julia> rand(t)
1.0
You're right, the truncated(Normal(0, 1), 0, 0)
case is unintentionally broken by the change in https://github.com/JuliaStats/Distributions.jl/pull/1723/commits/94d28172699dd62ab4d160ce54ba741905e60617.