Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Throw an error when truncating outside of support

Open ararslan opened this issue 1 year ago • 5 comments

Fixes #843 using option 2 as described here. The implementation checks for 0 probability within the truncated region but continues to allow coincident truncation bounds if they're within the support. That avoids breaking cases like that shown in #1712.

ararslan avatar May 18 '23 17:05 ararslan

Looks like this approach doesn't quite do it, as evidenced by the large number of test failures 🙃

ararslan avatar May 18 '23 19:05 ararslan

Some of them at least are due to tests that have to be fixed/removed: https://github.com/JuliaStats/Distributions.jl/actions/runs/5017513840/jobs/8995725428?pr=1723#step:6:67

Some others seem to indicate that logtp = -Inf currently for normal distributions truncated far away from the mean. Even though some of their traits (e.g., mean and variance) are tested they are basically useless:

julia> d = truncated(Normal(0, 1), 100, 115)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=100.0, upper=115.0)

julia> mean(d)
100.00999800099926

julia> var(d)
9.994005086296622e-5

julia> pdf(d, 101)
NaN

julia> logpdf(d, 101)
Inf

julia> d.logtp
-Inf

julia> d.tp
0.0

I think we might be able to fix these existing numerical issues for normal distributions (at least) by improving the numerical accuracy in the calculation of logtp. IMO this should be possible since

julia> SpecialFunctions.logerf(100/sqrt(2), 115/sqrt(2)) - log(2)
-5005.5242086942035

It seems one part could consist of improving logdiffcdf for normal distributions as currently

julia> logdiffcdf(Normal(0, 1), 115, 100)
-Inf

And then maybe logdiffcdf could be used in truncated?

devmotion avatar May 24 '23 23:05 devmotion

I opened #1728 to address the logdiffcdf issue for normal distributions.

devmotion avatar May 25 '23 09:05 devmotion

Something @devmotion wrote in #1721 reminds me that there's actually 2 separate issues that are closely related, but might require different solutions.

I think we would want to catch this case earlier in truncated: A distribution with zero mass is by definition not a distribution. See https://github.com/JuliaStats/Distributions.jl/issues/843.

I would argue that something like truncated(Normal(0,1), 0, 0) is actually a degenerate distribution with infinite mass at a single point. On the other hand, truncated(Uniform(0,0), -2, -1) is indeed not a distribution.

Now the question is, should we handle these cases differently?

It seems to me like this pull request would error out on randf(Normal(0,1), 0, 0) since:

julia> logdiffcdf(Normal(0,1), 0, 0)
-Inf

Note that this currently works "as proper", in my opinion.

julia> t = truncated(Normal(0,1), 0, 0)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=0.0, upper=0.0)

julia> rand(t)
0.0

julia> t = truncated(Normal(0,1), 1, 1)
Truncated(Normal{Float64}(μ=0.0, σ=1.0); lower=1.0, upper=1.0)

julia> rand(t)
1.0

quildtide avatar Aug 01 '24 20:08 quildtide

You're right, the truncated(Normal(0, 1), 0, 0) case is unintentionally broken by the change in https://github.com/JuliaStats/Distributions.jl/pull/1723/commits/94d28172699dd62ab4d160ce54ba741905e60617.

ararslan avatar Aug 02 '24 00:08 ararslan