Distributions.jl
Distributions.jl copied to clipboard
Add fast path for `loglikelihood` of `Uniform` and fix derivatives
The PR adds a fast path for loglikelihood(::Uniform, ::AbstractArray{<:Real}) and defines its derivatives to 1) ensure that the derivatives at the boundaries are zero (as per https://github.com/JuliaStats/Distributions.jl/pull/1459) and 2) improve performance. Moreover, the PR fixes and adds tests for the derivatives of logpdf(::Uniform, ::Real) at the boundaries.
On master:
julia> using Distributions, Zygote, BenchmarkTools
julia> a = rand();
julia> b = a + rand();
julia> x = rand(Uniform(a, b), 10_000);
julia> @btime loglikelihood($(Uniform(a, b)), $x);
82.344 μs (1 allocation: 16 bytes)
julia> f(x) = loglikelihood(Uniform(x[1], x[2]), x[3:end]);
julia> x = vcat(0.0, 1.0, rand(100));
julia> @btime Zygote.gradient($f, $x);
30.667 μs (211 allocations: 36.23 KiB)
julia> x = vcat(0.0, 1.0, -0.5, rand(99));
julia> @btime Zygote.gradient($f, $x);
31.388 μs (211 allocations: 36.23 KiB)
With this PR:
julia> using Distributions, Zygote, BenchmarkTools
julia> a = rand();
julia> b = a + rand();
julia> x = rand(Uniform(a, b), 10_000);
julia> @btime loglikelihood($(Uniform(a, b)), $x);
8.561 μs (1 allocation: 16 bytes)
julia> f(x) = loglikelihood(Uniform(x[1], x[2]), x[3:end]);
julia> x = vcat(0.0, 1.0, rand(100));
julia> @btime Zygote.gradient($f, $x);
5.322 μs (62 allocations: 4.67 KiB)
julia> x = vcat(0.0, 1.0, -0.5, rand(99));
julia> @btime Zygote.gradient($f, $x);
5.266 μs (62 allocations: 4.67 KiB)
Codecov Report
Merging #1490 (cab5e86) into master (1fe8d85) will decrease coverage by
0.97%. The diff coverage is100.00%.
:exclamation: Current head cab5e86 differs from pull request most recent head 9b51dce. Consider uploading reports for the commit 9b51dce to get more accurate results
@@ Coverage Diff @@
## master #1490 +/- ##
==========================================
- Coverage 85.45% 84.47% -0.98%
==========================================
Files 128 124 -4
Lines 7819 7519 -300
==========================================
- Hits 6682 6352 -330
- Misses 1137 1167 +30
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/univariate/continuous/uniform.jl | 94.44% <100.00%> (+1.50%) |
:arrow_up: |
| src/univariate/continuous/studentizedrange.jl | 75.00% <0.00%> (-9.62%) |
:arrow_down: |
| src/univariate/continuous/pgeneralizedgaussian.jl | 61.76% <0.00%> (-6.81%) |
:arrow_down: |
| src/univariate/continuous/erlang.jl | 65.71% <0.00%> (-5.72%) |
:arrow_down: |
| src/univariate/continuous/noncentralbeta.jl | 94.44% <0.00%> (-5.56%) |
:arrow_down: |
| src/univariate/continuous/noncentralchisq.jl | 80.95% <0.00%> (-5.42%) |
:arrow_down: |
| src/univariate/continuous/noncentralt.jl | 85.71% <0.00%> (-5.20%) |
:arrow_down: |
| src/univariate/continuous/noncentralf.jl | 86.95% <0.00%> (-4.72%) |
:arrow_down: |
| src/multivariates.jl | 38.46% <0.00%> (-4.40%) |
:arrow_down: |
| src/univariate/continuous/chisq.jl | 75.00% <0.00%> (-4.32%) |
:arrow_down: |
| ... and 79 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 1fe8d85...9b51dce. Read the comment docs.
@devmotion fixed the conflict here, what's the status?
Thanks for the ping!
I postponed the PR because I wanted to think about it carefully once more. I came to the conclusion that
- in my opinion it is useful to add definitions for the derivatives of the log pdf and loglikelihood, both to ensure that the computation is correct in the way we define it, regardless of the AD-backend specific quirks, and to ensure that computations are fast for such an important distribution,
- in my opinion, one should define the derivatives as NaN for values outside of the support, regardless of how (not) useful NaN is in general. Otherwise if one defines the derivative of
logpdf(::Uniform, x::Real)andloglikelihood(::Uniform, xs::AbstractArray{<:Real})as zero ifxor at least one element ofxsis outside the support, as in this PR, additivity is violated: the derivative ofsum(logpdf.(::Uniform, xs))will be non-zero and hence different from the derivative ofloglikelihood. Thus it seems we want a "destructive" value as derivative instead, and (a type-stable) NaN seems the natural choice. - I still think it makes sense to define the derivatives at the boundaries, and in particular ensure that they are finite (ie not NaN). This helps, eg, to avoid surprising results if one is numerically too close to the boundary. I think the one-sided limits are a bit more natural choice. I can still see that zero is an appealing choice as well since it avoids problems with gradient descent algorithms (as discussed in this PR and the previous one), but personally I think this special case and the discontinuity it creates are somehow unsatisfying. Moreover, arguably optimization algorithms/packages are responsible for taking care of and respecting domain constraints, and it should not be encoded implicitly in the derivatives.
Since this PR does not respect these considerations, I think it should not be merged. I guess it's cleaner to open a new separate PR based on these suggestions if others agree that they seem reasonable than updating the PR. I could open such a PR right away but I'd like to have some feedback first 🙂
the argument for NaN outside of the support makes sense yes