Pathfinder.jl
Pathfinder.jl copied to clipboard
Support ADTypes and autodiff by default
Turing, Optimization, and LogDensityProblems now support specifying an AD backend using ADTypes. Turing and Optimization do this via an adtype
keyword. This PR adds an adtype
parameter to pathfinder
and multipathfinder
, which then hooks into Optimization.jl's AD machinery to automatically differentiate the log-density function.
Since we depend on ForwardDiff indirectly via Optim, we use AutoForwardDiff
as the default adtype
, so that AD functions are automatically differentiated by default.
As a result, the PR also adds support for generic log-density functions (not just those that implement the LogDensityProblems interface).
Fixes #93
Codecov Report
Attention: Patch coverage is 96.42857%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 80.61%. Comparing base (
fae6c4b
) to head (dccbc6d
).
Files | Patch % | Lines |
---|---|---|
ext/PathfinderTuringExt.jl | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #178 +/- ##
==========================================
- Coverage 84.72% 80.61% -4.12%
==========================================
Files 13 13
Lines 609 614 +5
==========================================
- Hits 516 495 -21
- Misses 93 119 +26
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @sethaxen,
Following up from #93 . For our example, pathfinder fails here with a MethodError
on a promote_rule
MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.
Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.
Is that an error on Forward-over-reverse hessian calculation?
Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.
At first glance, I don't think this is a Pathfinder issue. We only use log2π
in one place, and its outside of anything AD should be seeing. Does this work fine?
using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())
If so, this is more or less what Pathfinder is doing internally, so the issue is probably somewhere in Pathfinder. If this fails, then the issue is somewhere else. Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)
Can you post the full stack trace?
Is that an error on Forward-over-reverse hessian calculation?
Not certain. Neither L-BFGS nor Pathfinder computes the Hessian, so if your model doesn't either, I don't see why this would be forward-over-reverse. But ReverseDiff does use ForwardDiff internally in a few places.
Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail. Does this work fine?
using Optimization, OptimizationOptimJL prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false) sol = solve(prob, LBFGS())
For some reason that doesn't generate an init
; I thought there was a fallback?
ERROR: MethodError: no method matching init(::@NamedTuple{…}, ::LBFGS{…}) Closest candidates are: init(::OptimizationProblem, ::Any, Any...; kwargs...) @ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 init(::Roots.ZeroProblem, ::Any; kwargs...) @ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:306 init(::Roots.AbstractUnivariateZeroMethod, ::Any, ::Roots.AbstractUnivariateZeroState) @ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:311 ...
Stacktrace: [1] solve(::@NamedTuple{…}, ::Vararg{…}; kwargs::@Kwargs{}) @ CommonSolve ~/.julia/packages/CommonSolve/JfpfI/src/CommonSolve.jl:23 [2] top-level scope @ REPL[9]:1 Some type information was truncated. Use
show(err)
to see complete types.
EDIT: Turing.optim_problem
does create a Turing.Optimisation.Init
object as a field of prob
but its obviously not dispatching for some reason.
Can you post the full stack trace?
Here you go:
ERROR: MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.
Candidates: promote_rule(::Type{S}, ::Type{T}) where {S<:AbstractIrrational, T<:Number} @ Base irrationals.jl:45 promote_rule(::Type{<:AbstractIrrational}, ::Type{T}) where T<:Real @ Base irrationals.jl:44 promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:Real, T, V, N} @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:427
Possible fix, define promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:AbstractIrrational, T, V, N}
Stacktrace: [1] promote_type(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}}) @ Base ./promotion.jl:313 [2] promote_type(::Type, ::Type, ::Type) @ Base ./promotion.jl:299 [3] ForwardOptimize @ ~/.julia/packages/ReverseDiff/UJhiD/src/macros.jl:122 [inlined] [4] + @ ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/scalars.jl:17 [inlined] [5] normlogpdf(z::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:29 [6] normlogpdf(μ::Float64, σ::Float64, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:41 [7] logpdf(d::Normal{Float64}, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing}) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/univariates.jl:645 [8] logpdf(d::Truncated{…}, x::ReverseDiff.TrackedReal{…}) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/truncate.jl:161 [9] tilde_assume(ctx::Turing.Optimisation.OptimizationContext{…}, dist::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…}) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:60 [10] tilde_assume @ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:63 [inlined] [11] tilde_assume @ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:57 [inlined] [12] tilde_assume!!(context::DynamicPPL.FixedContext{…}, right::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:138 [13] generate_latent(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, latent_model::RandomWalk{…}, n::Int64) @ EpiAware.EpiLatentModels ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/EpiLatentModels/randomwalk.jl:89 [14] _evaluate!! @ ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [inlined] [15] macro expansion @ ~/.julia/packages/DynamicPPL/pg70d/src/submodel_macro.jl:244 [inlined] [16] make_epi_aware(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, y_t::Vector{…}, time_steps::Int64; epi_model::DirectInfections{…}, latent_model::RandomWalk{…}, observation_model::DelayObservations{…}) @ EpiAware ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/make_epi_aware.jl:8 [17] _evaluate!!(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [18] evaluate_threadsafe!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:952 [19] evaluate!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…}) @ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:887 [20] (::LogDensityFunction{…})(z::ReverseDiff.TrackedArray{…}) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:111 [21] (::Turing.Optimisation.var"#l#4"{…})(x::ReverseDiff.TrackedArray{…}, ::SciMLBase.NullParameters) @ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:294 [22] (::OptimizationReverseDiffExt.var"#51#75"{…})(::ReverseDiff.TrackedArray{…}) @ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:157 [23] ReverseDiff.GradientTape(f::OptimizationReverseDiffExt.var"#51#75"{…}, input::Vector{…}, cfg::ReverseDiff.GradientConfig{…}) @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:199 [24] ReverseDiff.GradientTape(f::Function, input::Vector{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}}) @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:198 [25] instantiate_function(f::OptimizationFunction{…}, cache::OptimizationBase.ReInitCache{…}, adtype::AutoReverseDiff, num_cons::Int64) @ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:187 [26] OptimizationCache(prob::OptimizationProblem{…}, opt::LBFGS{…}, data::Base.Iterators.Cycle{…}; callback::Pathfinder.OptimizationCallback{…}, maxiters::Int64, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, kwargs::@Kwargs{}) @ OptimizationBase ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:27 [27] OptimizationCache @ ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:17 [inlined] [28] #__init#2 @ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:101 [inlined] [29] __init (repeats 2 times) @ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:68 [inlined] [30] #init#599 @ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:166 [inlined] [31] init @ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 [inlined] [32] solve(::OptimizationProblem{…}, ::LBFGS{…}; kwargs::@Kwargs{…}) @ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:96 [33] optimize_with_trace(prob::OptimizationProblem{…}, optimizer::LBFGS{…}; progress_name::String, progress_id::Base.UUID, maxiters::Int64, callback::Nothing, fail_on_nonfinite::Bool, kwargs::@Kwargs{}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:53 [34] optimize_with_trace @ ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:34 [inlined] [35] _pathfinder(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, executor::SequentialEx{…}, kwargs::@Kwargs{…}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:290 [36] _pathfinder @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:279 [inlined] [37] _pathfinder_try_until_succeed(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; ntries::Int64, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, allow_mutating_init::Bool, kwargs::@Kwargs{…}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:265 [38] _pathfinder_try_until_succeed @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:253 [inlined] [39] #25 @ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:182 [inlined] [40] progress(f::Pathfinder.var"#25#27"{…}; name::String) @ ProgressLogging ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:262 [41] progress @ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:258 [inlined] [42] pathfinder(prob::OptimizationProblem{…}; rng::Random._GLOBAL_RNG, history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, ndraws::Int64, input::Model{…}, kwargs::@Kwargs{}) @ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:181 [43] pathfinder(model::Model{…}; rng::Random._GLOBAL_RNG, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, init::Nothing, adtype::AutoReverseDiff, kwargs::@Kwargs{}) @ PathfinderTuringExt ~/.julia/packages/Pathfinder/99eNx/ext/PathfinderTuringExt.jl:118 [44] top-level scope @ ~/GitHub/CFA/Rt-without-renewal/EpiAware/docs/src/examples/getting_started.jl:295 Some type information was truncated. Use
show(err)
to see complete types.
Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)
Yes, the workflow is that pathfinder is part of initialisation for a HMC/NUTS sampler and the NUTS runs ok (even better with good pre-heating!).
Thanks for looking into this!
@SamuelBrand1 this line is using a deprecated syntax: https://github.com/CDCgov/Rt-without-renewal/blob/d6344cc6e451e3e6c4188e4984247f890ae60795/EpiAware/src/EpiLatentModels/utils.jl#L19, which could be the issue here.
Try this instead, and let me know if it works truncated(Normal(0, prior_mean * sqrt(pi) / sqrt(2)); lower=0)
That doesn't solve this problem.
However, switching to AutoReverseDiff(false)
does. The model I'm sampling can fail because it can sample some big numbers in the initialisation phase which causes problems with sampling from a Negative binomial, this seems to cause problems with pathfinder
in reversediff but either not a problem or a silent problem for AdvancedHMC NUTS calls.
I think making the model more resilient is our problem though!
Thanks for helping so much.
However, switching to
AutoReverseDiff(false)
does
Ah, okay. Tape compilation is only safe if during program execution the same branches of all control flow are always taken. Not certain what control flow you might be encountering here. If this is the issue, it's maybe a little strange that Turing and Pathfinder could hit different branches. One way this can happen is if the bulk of the probability mass hits one branch, while the mode hits a different branch, since modes can look very different from the bulk; then you expect any good optimizer to hit a different branch.
Either way, if AutoReverseDiff(false)
is working with Pathfinder, that seems to indicate well that this PR is doing the right thing, and the issue is probably elsewhere.
Thanks for helping so much.
No problem!