Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Add fast path for `loglikelihood` of `Uniform` and fix derivatives

Open devmotion opened this issue 3 years ago • 4 comments

The PR adds a fast path for loglikelihood(::Uniform, ::AbstractArray{<:Real}) and defines its derivatives to 1) ensure that the derivatives at the boundaries are zero (as per https://github.com/JuliaStats/Distributions.jl/pull/1459) and 2) improve performance. Moreover, the PR fixes and adds tests for the derivatives of logpdf(::Uniform, ::Real) at the boundaries.

On master:

julia> using Distributions, Zygote, BenchmarkTools

julia> a = rand();

julia> b = a + rand();

julia> x = rand(Uniform(a, b), 10_000);

julia> @btime loglikelihood($(Uniform(a, b)), $x);
  82.344 μs (1 allocation: 16 bytes)

julia> f(x) = loglikelihood(Uniform(x[1], x[2]), x[3:end]);

julia> x = vcat(0.0, 1.0, rand(100));

julia> @btime Zygote.gradient($f, $x);
  30.667 μs (211 allocations: 36.23 KiB)

julia> x = vcat(0.0, 1.0, -0.5, rand(99));

julia> @btime Zygote.gradient($f, $x);
  31.388 μs (211 allocations: 36.23 KiB)

With this PR:


julia> using Distributions, Zygote, BenchmarkTools

julia> a = rand();

julia> b = a + rand();

julia> x = rand(Uniform(a, b), 10_000);

julia> @btime loglikelihood($(Uniform(a, b)), $x);
  8.561 μs (1 allocation: 16 bytes)

julia> f(x) = loglikelihood(Uniform(x[1], x[2]), x[3:end]);

julia> x = vcat(0.0, 1.0, rand(100));

julia> @btime Zygote.gradient($f, $x);
  5.322 μs (62 allocations: 4.67 KiB)

julia> x = vcat(0.0, 1.0, -0.5, rand(99));

julia> @btime Zygote.gradient($f, $x);
  5.266 μs (62 allocations: 4.67 KiB)

devmotion avatar Jan 23 '22 00:01 devmotion

Codecov Report

Merging #1490 (cab5e86) into master (1fe8d85) will decrease coverage by 0.97%. The diff coverage is 100.00%.

:exclamation: Current head cab5e86 differs from pull request most recent head 9b51dce. Consider uploading reports for the commit 9b51dce to get more accurate results

@@            Coverage Diff             @@
##           master    #1490      +/-   ##
==========================================
- Coverage   85.45%   84.47%   -0.98%     
==========================================
  Files         128      124       -4     
  Lines        7819     7519     -300     
==========================================
- Hits         6682     6352     -330     
- Misses       1137     1167      +30     
Impacted Files Coverage Δ
src/univariate/continuous/uniform.jl 94.44% <100.00%> (+1.50%) :arrow_up:
src/univariate/continuous/studentizedrange.jl 75.00% <0.00%> (-9.62%) :arrow_down:
src/univariate/continuous/pgeneralizedgaussian.jl 61.76% <0.00%> (-6.81%) :arrow_down:
src/univariate/continuous/erlang.jl 65.71% <0.00%> (-5.72%) :arrow_down:
src/univariate/continuous/noncentralbeta.jl 94.44% <0.00%> (-5.56%) :arrow_down:
src/univariate/continuous/noncentralchisq.jl 80.95% <0.00%> (-5.42%) :arrow_down:
src/univariate/continuous/noncentralt.jl 85.71% <0.00%> (-5.20%) :arrow_down:
src/univariate/continuous/noncentralf.jl 86.95% <0.00%> (-4.72%) :arrow_down:
src/multivariates.jl 38.46% <0.00%> (-4.40%) :arrow_down:
src/univariate/continuous/chisq.jl 75.00% <0.00%> (-4.32%) :arrow_down:
... and 79 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1fe8d85...9b51dce. Read the comment docs.

codecov-commenter avatar Jan 23 '22 00:01 codecov-commenter

@devmotion fixed the conflict here, what's the status?

matbesancon avatar Apr 18 '22 09:04 matbesancon

Thanks for the ping!

I postponed the PR because I wanted to think about it carefully once more. I came to the conclusion that

  • in my opinion it is useful to add definitions for the derivatives of the log pdf and loglikelihood, both to ensure that the computation is correct in the way we define it, regardless of the AD-backend specific quirks, and to ensure that computations are fast for such an important distribution,
  • in my opinion, one should define the derivatives as NaN for values outside of the support, regardless of how (not) useful NaN is in general. Otherwise if one defines the derivative of logpdf(::Uniform, x::Real) and loglikelihood(::Uniform, xs::AbstractArray{<:Real}) as zero if x or at least one element of xs is outside the support, as in this PR, additivity is violated: the derivative of sum(logpdf.(::Uniform, xs)) will be non-zero and hence different from the derivative of loglikelihood. Thus it seems we want a "destructive" value as derivative instead, and (a type-stable) NaN seems the natural choice.
  • I still think it makes sense to define the derivatives at the boundaries, and in particular ensure that they are finite (ie not NaN). This helps, eg, to avoid surprising results if one is numerically too close to the boundary. I think the one-sided limits are a bit more natural choice. I can still see that zero is an appealing choice as well since it avoids problems with gradient descent algorithms (as discussed in this PR and the previous one), but personally I think this special case and the discontinuity it creates are somehow unsatisfying. Moreover, arguably optimization algorithms/packages are responsible for taking care of and respecting domain constraints, and it should not be encoded implicitly in the derivatives.

Since this PR does not respect these considerations, I think it should not be merged. I guess it's cleaner to open a new separate PR based on these suggestions if others agree that they seem reasonable than updating the PR. I could open such a PR right away but I'd like to have some feedback first 🙂

devmotion avatar Apr 18 '22 10:04 devmotion

the argument for NaN outside of the support makes sense yes

matbesancon avatar Apr 25 '22 21:04 matbesancon