Pathfinder.jl icon indicating copy to clipboard operation
Pathfinder.jl copied to clipboard

Support ADTypes and autodiff by default

Open sethaxen opened this issue 11 months ago • 10 comments

Turing, Optimization, and LogDensityProblems now support specifying an AD backend using ADTypes. Turing and Optimization do this via an adtype keyword. This PR adds an adtype parameter to pathfinder and multipathfinder, which then hooks into Optimization.jl's AD machinery to automatically differentiate the log-density function.

Since we depend on ForwardDiff indirectly via Optim, we use AutoForwardDiff as the default adtype, so that AD functions are automatically differentiated by default.

As a result, the PR also adds support for generic log-density functions (not just those that implement the LogDensityProblems interface).

Fixes #93

sethaxen avatar Mar 16 '24 18:03 sethaxen

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.61%. Comparing base (fae6c4b) to head (dccbc6d).

Files Patch % Lines
ext/PathfinderTuringExt.jl 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #178      +/-   ##
==========================================
- Coverage   84.72%   80.61%   -4.12%     
==========================================
  Files          13       13              
  Lines         609      614       +5     
==========================================
- Hits          516      495      -21     
- Misses         93      119      +26     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 16 '24 18:03 codecov[bot]

Hi @sethaxen,

Following up from #93 . For our example, pathfinder fails here with a MethodError on a promote_rule

MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

Is that an error on Forward-over-reverse hessian calculation?

SamuelBrand1 avatar Mar 18 '24 14:03 SamuelBrand1

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

At first glance, I don't think this is a Pathfinder issue. We only use log2π in one place, and its outside of anything AD should be seeing. Does this work fine?

using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

If so, this is more or less what Pathfinder is doing internally, so the issue is probably somewhere in Pathfinder. If this fails, then the issue is somewhere else. Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Can you post the full stack trace?

Is that an error on Forward-over-reverse hessian calculation?

Not certain. Neither L-BFGS nor Pathfinder computes the Hessian, so if your model doesn't either, I don't see why this would be forward-over-reverse. But ReverseDiff does use ForwardDiff internally in a few places.

sethaxen avatar Mar 18 '24 14:03 sethaxen

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail. Does this work fine?

using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

For some reason that doesn't generate an init; I thought there was a fallback?

ERROR: MethodError: no method matching init(::@NamedTuple{…}, ::LBFGS{…}) Closest candidates are: init(::OptimizationProblem, ::Any, Any...; kwargs...) @ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 init(::Roots.ZeroProblem, ::Any; kwargs...) @ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:306 init(::Roots.AbstractUnivariateZeroMethod, ::Any, ::Roots.AbstractUnivariateZeroState) @ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:311 ...

Stacktrace: [1] solve(::@NamedTuple{…}, ::Vararg{…}; kwargs::@Kwargs{}) @ CommonSolve ~/.julia/packages/CommonSolve/JfpfI/src/CommonSolve.jl:23 [2] top-level scope @ REPL[9]:1 Some type information was truncated. Use show(err) to see complete types.

EDIT: Turing.optim_problem does create a Turing.Optimisation.Init object as a field of prob but its obviously not dispatching for some reason.

SamuelBrand1 avatar Mar 19 '24 10:03 SamuelBrand1

Can you post the full stack trace?

Here you go:

ERROR: MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Candidates: promote_rule(::Type{S}, ::Type{T}) where {S<:AbstractIrrational, T<:Number} @ Base irrationals.jl:45 promote_rule(::Type{<:AbstractIrrational}, ::Type{T}) where T<:Real @ Base irrationals.jl:44 promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:Real, T, V, N} @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:427

Possible fix, define promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:AbstractIrrational, T, V, N}

Stacktrace: [1] promote_type(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}}) @ Base ./promotion.jl:313 [2] promote_type(::Type, ::Type, ::Type) @ Base ./promotion.jl:299 [3] ForwardOptimize @ ~/.julia/packages/ReverseDiff/UJhiD/src/macros.jl:122 [inlined] [4] + @ ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/scalars.jl:17 [inlined] [5] normlogpdf(z::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:29 [6] normlogpdf(μ::Float64, σ::Float64, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:41 [7] logpdf(d::Normal{Float64}, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/univariates.jl:645 [8] logpdf(d::Truncated{…}, x::ReverseDiff.TrackedReal{…}) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/truncate.jl:161 [9] tilde_assume(ctx::Turing.Optimisation.OptimizationContext{…}, dist::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…}) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:60 [10] tilde_assume @ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:63 [inlined] [11] tilde_assume @ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:57 [inlined] [12] tilde_assume!!(context::DynamicPPL.FixedContext{…}, right::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:138 [13] generate_latent(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, latent_model::RandomWalk{…}, n::Int64) @ EpiAware.EpiLatentModels ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/EpiLatentModels/randomwalk.jl:89 [14] _evaluate!! @ ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [inlined] [15] macro expansion @ ~/.julia/packages/DynamicPPL/pg70d/src/submodel_macro.jl:244 [inlined] [16] make_epi_aware(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, y_t::Vector{…}, time_steps::Int64; epi_model::DirectInfections{…}, latent_model::RandomWalk{…}, observation_model::DelayObservations{…}) @ EpiAware ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/make_epi_aware.jl:8 [17] _evaluate!!(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [18] evaluate_threadsafe!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:952 [19] evaluate!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:887 [20] (::LogDensityFunction{…})(z::ReverseDiff.TrackedArray{…}) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:111 [21] (::Turing.Optimisation.var"#l#4"{…})(x::ReverseDiff.TrackedArray{…}, ::SciMLBase.NullParameters) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:294 [22] (::OptimizationReverseDiffExt.var"#51#75"{…})(::ReverseDiff.TrackedArray{…}) @ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:157 [23] ReverseDiff.GradientTape(f::OptimizationReverseDiffExt.var"#51#75"{…}, input::Vector{…}, cfg::ReverseDiff.GradientConfig{…}) @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:199 [24] ReverseDiff.GradientTape(f::Function, input::Vector{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}}) @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:198 [25] instantiate_function(f::OptimizationFunction{…}, cache::OptimizationBase.ReInitCache{…}, adtype::AutoReverseDiff, num_cons::Int64) @ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:187 [26] OptimizationCache(prob::OptimizationProblem{…}, opt::LBFGS{…}, data::Base.Iterators.Cycle{…}; callback::Pathfinder.OptimizationCallback{…}, maxiters::Int64, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, kwargs::@Kwargs{}) @ OptimizationBase ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:27 [27] OptimizationCache @ ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:17 [inlined] [28] #__init#2 @ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:101 [inlined] [29] __init (repeats 2 times) @ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:68 [inlined] [30] #init#599 @ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:166 [inlined] [31] init @ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 [inlined] [32] solve(::OptimizationProblem{…}, ::LBFGS{…}; kwargs::@Kwargs{…}) @ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:96 [33] optimize_with_trace(prob::OptimizationProblem{…}, optimizer::LBFGS{…}; progress_name::String, progress_id::Base.UUID, maxiters::Int64, callback::Nothing, fail_on_nonfinite::Bool, kwargs::@Kwargs{}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:53 [34] optimize_with_trace @ ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:34 [inlined] [35] _pathfinder(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, executor::SequentialEx{…}, kwargs::@Kwargs{…}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:290 [36] _pathfinder @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:279 [inlined] [37] _pathfinder_try_until_succeed(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; ntries::Int64, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, allow_mutating_init::Bool, kwargs::@Kwargs{…}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:265 [38] _pathfinder_try_until_succeed @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:253 [inlined] [39] #25 @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:182 [inlined] [40] progress(f::Pathfinder.var"#25#27"{…}; name::String) @ ProgressLogging ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:262 [41] progress @ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:258 [inlined] [42] pathfinder(prob::OptimizationProblem{…}; rng::Random._GLOBAL_RNG, history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, ndraws::Int64, input::Model{…}, kwargs::@Kwargs{}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:181 [43] pathfinder(model::Model{…}; rng::Random._GLOBAL_RNG, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, init::Nothing, adtype::AutoReverseDiff, kwargs::@Kwargs{}) @ PathfinderTuringExt ~/.julia/packages/Pathfinder/99eNx/ext/PathfinderTuringExt.jl:118 [44] top-level scope @ ~/GitHub/CFA/Rt-without-renewal/EpiAware/docs/src/examples/getting_started.jl:295 Some type information was truncated. Use show(err) to see complete types.

SamuelBrand1 avatar Mar 19 '24 10:03 SamuelBrand1

Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Yes, the workflow is that pathfinder is part of initialisation for a HMC/NUTS sampler and the NUTS runs ok (even better with good pre-heating!).

SamuelBrand1 avatar Mar 19 '24 10:03 SamuelBrand1

Thanks for looking into this!

SamuelBrand1 avatar Mar 19 '24 10:03 SamuelBrand1

@SamuelBrand1 this line is using a deprecated syntax: https://github.com/CDCgov/Rt-without-renewal/blob/d6344cc6e451e3e6c4188e4984247f890ae60795/EpiAware/src/EpiLatentModels/utils.jl#L19, which could be the issue here. Try this instead, and let me know if it works truncated(Normal(0, prior_mean * sqrt(pi) / sqrt(2)); lower=0)

sethaxen avatar Mar 19 '24 13:03 sethaxen

That doesn't solve this problem.

However, switching to AutoReverseDiff(false) does. The model I'm sampling can fail because it can sample some big numbers in the initialisation phase which causes problems with sampling from a Negative binomial, this seems to cause problems with pathfinder in reversediff but either not a problem or a silent problem for AdvancedHMC NUTS calls.

I think making the model more resilient is our problem though!

Thanks for helping so much.

SamuelBrand1 avatar Mar 19 '24 13:03 SamuelBrand1

However, switching to AutoReverseDiff(false) does

Ah, okay. Tape compilation is only safe if during program execution the same branches of all control flow are always taken. Not certain what control flow you might be encountering here. If this is the issue, it's maybe a little strange that Turing and Pathfinder could hit different branches. One way this can happen is if the bulk of the probability mass hits one branch, while the mode hits a different branch, since modes can look very different from the bulk; then you expect any good optimizer to hit a different branch.

Either way, if AutoReverseDiff(false) is working with Pathfinder, that seems to indicate well that this PR is doing the right thing, and the issue is probably elsewhere.

Thanks for helping so much.

No problem!

sethaxen avatar Mar 19 '24 14:03 sethaxen