Link/Provide full tested tutorial for Optimization in FAQ
Is your feature request related to a problem? Please describe.
I find it hard to translate the help given in FAQ on Optimization/AD into code that actually performs a parameter optimization.
Other also encountered difficulties with Optimization:
- #2979
- #2606
- (The currently proposed solution there also uses
setpwith the associated problems with Zygote)
- (The currently proposed solution there also uses
Describe the solution you’d like
- Provide a short tutorial of an executable code, that actually optimizes a subset of parameters and initial conditions of an ODEProblem derived from a MTK system.
- Implement tests for this problem that use different Optimizers and AD backends (should include ForwardDiff and Zygote)
Describe alternatives you’ve considered
Additional context
I propose a modification of the SciMLSensitivity example .
I provide the following code as a start for the tutorial. It tries 3 different approaches of updating the Problem, but encounters several obstacles. The third alternative works, but only using ForwardDiff.jl and is rather complicated.
The example uses the standard quite simple Lotka-Volterra problem, but simulates some complexity by using an non-scalar parameter, px[1:2] = [α, β].
import Pkg;
Pkg.activate(;temp=true)
Pkg.add(["OrdinaryDiffEq","Optimization","OptimizationPolyalgorithms","SciMLSensitivity",
"Zygote","ForwardDiff","ModelingToolkit","SymbolicIndexingInterface",
"SciMLStructures","SciMLBase","ComponentArrays","Plots"])
import OrdinaryDiffEq as ODE
import Optimization as OPT
import OptimizationPolyalgorithms as OPA
import SciMLSensitivity as SMS
import Zygote
import ForwardDiff
using SciMLBase
using ComponentArrays
#@usingany Plots: Plots
#using Plots: Plots
using ModelingToolkit, OrdinaryDiffEq
using ModelingToolkit: t_nounits as t, D_nounits as D, ModelingToolkit as MTK
using SymbolicIndexingInterface: SymbolicIndexingInterface as SII
using SciMLStructures: SciMLStructures as SS
@variables x(t) y(t) z(t)
@parameters px[1:2]=[1.5, 1.0] γ=3.0 δ=1.0
eqs = [D(x) ~ px[1] * x - px[2] * x * y
D(y) ~ -γ * y + δ * x * y
z ~ x + y]
@named sys = System(eqs, t)
simpsys = mtkcompile(sys)
tsteps = 0.0:0.1:10.0
tspan = extrema(tsteps)
prob = ODEProblem(simpsys, [x => 1.1, y => 1.2], tspan)
sol = sol_true = ODE.solve(prob, ODE.Tsit5(), saveat = tsteps)
#Plots.plot(sol)
paropt = [px, γ]
paropt_sym = Symbol.(paropt)
n_paropt = sum(length.(paropt))
stateopt = [x]
stateopt_sym = Symbol.(stateopt)
#p_true = vcat(prob.ps[paropt]..., prob[stateopt]...)
p_true = vcat(
ComponentVector(;zip(paropt_sym,prob.ps[paropt])...),
ComponentVector(;zip(stateopt_sym,prob[stateopt])...))
p0 = p = p_true .+ randn(length(p_true)) .* 0.2
probo = remake(prob) # copy
nest_structure(p, syms) = [p[k] for k in syms]
parstateopt = vcat(paropt, stateopt)
parstateopt_sym = Symbol.(parstateopt)
p0n = nest_structure(p0, parstateopt_sym)
#--------------- recommended method for optimization at MTK.FAQ
# obstacle: do not find documentation for initial conditions
# obstacle: problems with both Zygote and ForwardDiff
setter! = SII.setp(simpsys, paropt)
setter!(probo, nest_structure(p0, paropt_sym))
function loss(p)
local p_struc = nest_structure(p, paropt_sym) # omit u0
setter!(probo, p_struc)
local sol = ODE.solve(probo, ODE.Tsit5(), saveat = tsteps)
local loss = sum(abs2, sol .- sol_true)
return loss
end
loss(p0)
#Zygote.gradient(loss, p0)
#ForwardDiff.gradient(loss, p0)
#------------- alternative using indices,
# documentation for initial values
# obstacle: problems with both Zygote and ForwardDiff
probo = remake(prob)
ps_ind = MTK.parameter_index.(Ref(simpsys), paropt)
setindex!.(Ref(probo.ps), nest_structure(p0, paropt_sym), ps_ind)
probo.ps[px] == p0[Symbol("px[1:2]")]
x_ind = MTK.variable_index.(Ref(simpsys), stateopt) # 2
#setindex!(probo.ps, [p0[3]], x_ind)
function loss(p)
setindex!.(Ref(probo.ps), nest_structure(p, paropt_sym), ps_ind)
# for simplicity, start omitting u0
local xv = probo.u0
# local xvn = [begin
# ii = findfirst(==(i), x_ind)
# isnothing(ii) ? xv[i] : p[length(ps_ind) + ii]
# end for i in axes(xv,1)]
#local xvn = vcat(xv[1], p[3])
#probox = remake(probo, u0 = xvn)
local sol = ODE.solve(probo, ODE.Tsit5(), saveat = tsteps)
local loss = sum(abs2, sol .- sol_true)
return loss
end
loss(p0)
Zygote.gradient(loss, p0) # nothing?
#ForwardDiff.gradient(loss, p0)
#--------------- alternative: SciMLStructures
# works with ForwardDiff
# obstacle: how to infere positions of optimized parameters in canicalized buffer?
# obstacle: error or zero Zygote.gradient for initial conditions ?
probo = remake(prob)
function loss(p)
local sol = compute_sol(p, probo)
local loss = SciMLBase.successful_retcode(sol) ? sum(abs2, sol .- sol_true) : 1e30
return loss
end
function compute_sol(p, probo) # variant without Zygote compile error, but not general
local pv = SII.parameter_values(probo)
local buf, _, _ = SS.canonicalize(SS.Tunable(), pv)
local bufx, _, _ = SS.canonicalize(SS.Initials(), pv)
local bufn = vcat(p[1:2], buf[3], p[3]) # TODO describe general
local pvn = SS.replace(SS.Tunable(), pv, bufn)
local bufxn = vcat(bufx[1], p[4], bufx[3:end]) # TODO describe general
local pvn2 = SS.replace(SS.Initials(), pvn, bufxn)
#local probon = remake(probo; u0 = xvn, p = pvn) # u0 set to pvn.initials instead
local probon = remake(probo; p = pvn2)
local probon2 = probon
#local xvn = vcat(xv[1], p[3])
#local probon2 = remake(probon; u0 = xvn) # mutating?
#@show probon2.u0, probon2.p
local sol = ODE.solve(probon2, ODE.Tsit5(), saveat = tsteps)
return sol
end
psetter! = SII.setp(probo, paropt)
ssetter! = SII.setu(probo, stateopt)
function compute_sol(p, probo)
local pv = SII.parameter_values(probo)
local buf, _, _ = SS.canonicalize(SS.Tunable(), pv)
local bufx, _, _ = SS.canonicalize(SS.Initials(), pv)
# for ForwardDiff need to convert the eltype of the entire portions
ET = eltype(p)
local pvt = pv
pvt = eltype(buf) == ET ? pvt : SS.replace(SS.Tunable(), pvt, ET.(buf))
pvt = eltype(bufx) == ET ? pvt : SS.replace(SS.Initials(), pvt, ET.(bufx))
#local psetter! = SII.setp(probo, paropt)
local p_struc = nest_structure(p, paropt_sym)
psetter!(pvt, p_struc)
#local ssetter! = SII.setu(probo, stateopt)
local s_struc = nest_structure(p, stateopt_sym)
#ssetter!(pvt, s_struc) # state_values(MTKParameters) not implemented
local probon2 = remake(probo, p = pvt)
ssetter!(probon2, s_struc)
#local sol = ODE.solve(probon2, ODE.Tsit5(), saveat = tsteps)
# need another remake to update probon2.p.initial to probon2.u0
local probon3 = remake(probon2, u0=probon2.u0)
#@show probon3.u0, probon3.p
local sol = ODE.solve(probon3, ODE.Tsit5(), saveat = tsteps)
return sol
end
#include("tmp/test.jl")
loss(p0)
loss(p)
loss(p_true)
#Zygote.gradient(loss, p0)
ForwardDiff.gradient(loss, p0)
callback = function (state, l)
display(l)
# p = state.u
# sol = compute_sol(p)
# plt = Plots.plot(sol, ylim = (0, 7))
# display(plt)
# Tell Optimization.solve to not halt the optimization. If return true, then
# optimization stops.
return false
end
adtype = OPT.AutoForwardDiff()
#adtype = OPT.AutoZygote()
optf = OPT.OptimizationFunction((x, p) -> loss(x), adtype)
optprob = OPT.OptimizationProblem(optf, p0)
opt_alg = OPA.PolyOpt()
#opt_alg = OPA.LBFGS()
result_ode = OPT.solve(optprob, opt_alg,
callback = callback,
maxiters = 20,
)
result_ode.stats
p = result_ode.u
hcat(p0, p, p_true)
loss(result_ode.u)
Did you see https://docs.sciml.ai/ModelingToolkit/dev/examples/remake/ ? The details are a bit baroque, but it does provide a parameter optimization example.
Thanks @cstjean. This is very similar to what i discovered with the third alternative and comes close of what I asked for. I created a pull request to the documentation: #3955.
Is there a list of which AD backends for optimizing MKTSystem parameters are supported/tested?
Background: I am working on hybrid problems (combining process based modeling with machine learning), where the parameters of the ODEProblem are only a small subset of a much larger parameter vector, that includes ML-weights. For these kinds of problems, reverse AD is much more efficient. Currently, I use Zygote - which unfortunately does not work with setp from SymbolicIndexingInterface.
It's just the same list as SciMLSensitivity
Currently, I use Zygote - which unfortunately does not work with setp from SymbolicIndexingInterface.'
It should? There's tests with it. You're building the function outside of the differentiation context right?
A call to Zygote.gradient on the loss function reports a mutation error.
Did I confuse something? Should I write a bug report?
The error:
julia> Zygote.gradient(x -> loss(x, parg), p0)[1]
ERROR: Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] _throw_mutation_error(f::Function, args::Vector{Float64})
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:70
[3] (::Zygote.var"#553#554"{Vector{Float64}})(::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:82
[4] (::Zygote.var"#2643#back#555"{Zygote.var"#553#554"{Vector{Float64}}})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
[5] set_parameter!
@ ~/scratch/twutz/julia_cluster_depots/packages/ModelingToolkit/bdYND/src/systems/parameter_buffer.jl:398 [inlined]
[6] (::Zygote.Pullback{Tuple{typeof(set_parameter!), MTKParameters{Vector{…}, Vector{…}, Tuple{}, Tuple{}, Tuple{}, Tuple{}}, Float64, ModelingToolkit.ParameterIndex{Tunable, Int64}}, Any})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
[7] SetParameterIndex
@ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/parameter_indexing.jl:699 [inlined]
[8] (::Zygote.Pullback{Tuple{SymbolicIndexingInterface.SetParameterIndex{…}, MTKParameters{…}, Float64}, Tuple{Zygote.var"#2200#back#317"{…}, Zygote.Pullback{…}}})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
[9] MultipleSetters
@ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/parameter_indexing.jl:717 [inlined]
The slightly modified Basic example that raises the above error:
using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using OrdinaryDiffEq
using SymbolicIndexingInterface: parameter_values, state_values
using SciMLStructures: Tunable, canonicalize, replace, replace!
using PreallocationTools
using Optimization
#using OptimizationOptimJL # BFGS()
import OptimizationPolyalgorithms as OPA
import SciMLSensitivity
import ForwardDiff
import Zygote
@parameters α β γ δ
@variables x(t) y(t)
eqs = [D(x) ~ (α - β * y) * x
D(y) ~ (δ * x - γ) * y]
@mtkcompile odesys = System(eqs, t)
odeprob = ODEProblem(
odesys, [x => 1.0, y => 1.0, α => 1.5, β => 1.0, γ => 3.0, δ => 1.0], (0.0, 10.0))
timesteps = 0.0:0.1:10.0
sol = solve(odeprob, Tsit5(); saveat = timesteps)
data = Array(sol)
# add some random noise
data = data + 0.01 * randn(size(data))
paropt = [α, β, γ, δ]
ptrue = odeprob.ps[paropt]
p0 = ptrue + 0.2*randn(length(paropt))
function loss(x, p)
odeprob = p[1] # ODEProblem stored as parameters to avoid using global variables
ps = parameter_values(odeprob) # obtain the parameter object from the problem
diffcache = p[5]
# get an appropriately typed preallocated buffer to store the `x` values in
buffer = get_tmp(diffcache, x)
# copy the current values to this buffer
copyto!(buffer, canonicalize(Tunable(), ps)[1])
# create a copy of the parameter object with the buffer
ps = replace(Tunable(), ps, buffer)
# set the updated values in the parameter object
setter = p[4]
setter(ps, x)
# remake the problem, passing in our new parameter object
newprob = remake(odeprob; p = ps)
timesteps = p[2]
sol = solve(newprob, AutoTsit5(Rosenbrock23()); saveat = timesteps)
truth = p[3]
data = Array(sol)
return sum((truth .- data) .^ 2) / length(truth)
end
# manually create an OptimizationFunction to ensure usage of `ForwardDiff`, which will
# require changing the types of parameters from `Float64` to `ForwardDiff.Dual`
adtype = Optimization.AutoForwardDiff()
adtype = Optimization.AutoZygote()
optfn = OptimizationFunction(loss, adtype)
# function to set the parameters we are optimizing
setter = setp(odeprob, paropt)
# `DiffCache` to avoid allocations.
# `copy` prevents the buffer stored by `DiffCache` from aliasing the one in
# `parameter_values(odeprob)`.
diffcache = DiffCache(copy(canonicalize(Tunable(), parameter_values(odeprob))[1]))
parg = (odeprob, timesteps, data, setter, diffcache)
loss(p0, parg)
ForwardDiff.gradient(x -> loss(x, parg), p0)
Zygote.gradient(x -> loss(x, parg), p0)[1]
# parameter object is a tuple, to store differently typed objects together
optprob = OptimizationProblem(
optfn, p0, parg,
lb = 0.1zeros(4), ub = 5ones(4))
opt_alg = OPA.PolyOpt()
opt_alg = OPA.LBFGS()
sol = solve(optprob, opt_alg)
hcat(p0, sol.u, ptrue)
You have to use the out of place version, setsym_oop if it's Zygote. That's because Zygote doesn't support mutation, so every operation has to be out of place.
Thanks @ChrisRackauckas. This advice of replacing setp by setsym_oop might by the hint I am looking for since a long time.
I found documentation on setsym_oop.
Is there an example usage?
When i try to experiment with it using the context above I get the error:
julia> Zygote.gradient(x -> loss(x, parg), p0)[1]
ERROR: Mutating arrays is not supported -- called push!(Vector{Int64}, ...)
...
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] _throw_mutation_error(f::Function, args::Vector{Int64})
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:70
[3] (::Zygote.var"#561#562"{Vector{Int64}})(::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:89
[4] (::Zygote.var"#2663#back#563"{Zygote.var"#561#562"{Vector{Int64}}})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
[5] _setsym_oop
@ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/state_indexing.jl:502 [inlined]
[6] (::Zygote.Pullback{Tuple{typeof(SymbolicIndexingInterface._setsym_oop), ODEProblem{…}, NotSymbolic, ScalarSymbolic, Vector{…}}, Any})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
[7] setsym_oop
@ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/state_indexing.jl:437 [inlined]
[8] (::Zygote.Pullback{Tuple{typeof(setsym_oop), ODEProblem{…}, Vector{…}}, Tuple{Zygote.Pullback{…}, Zygote.Pullback{…}, Zygote.Pullback{…}, Zygote.Pullback{…}}})(Δ::Nothing)
@ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
[9] loss
@ /Net/Groups/BGI/people/twutz/julia/dev/MTKHelpers/tmp/test.jl:3 [inlined]
With the code:
setter_oop = setsym_oop(odeprob, paropt)
parg_oop = (odeprob, timesteps, data, setter_oop)
function loss(x, p)
local odeprob = p[1] # ODEProblem stored as parameters to avoid using global variables
local setter_oop = p[4]
(u0new, psnew) = setter_oop(odeprob, x)
newprob = remake(odeprob, u0=u0new, p=psnew)
timesteps = p[2]
local sol = solve(newprob, AutoTsit5(Rosenbrock23()); saveat = timesteps)
local truth = p[3]
data = Array(sol)
return sum((truth .- data) .^ 2) / length(truth)
end
Zygote.gradient(x -> loss(x, parg_oop), p0)[1]
The line that causes the error is the line in the cost function:
(u0new, psnew) = setter_oop(odeprob, x)
Maybe, this discussion should go better to a new issue.
https://github.com/SciML/SciMLBase.jl/blob/933c2802fb435c5343ec8a8c045cbcf737ecd4ea/test/downstream/remake_autodiff.jl#L21 covers this
In the linked test, I see the application of the setter only outside function called by Zygote gradient. To my understanding, however, the cost function needs to differentiate the setter.
The mutation error occurs for the following code after the linked test:
x0 = [1.0, 1.0, 1.5, 1.0, 1.0, 1.0]
Zygote.gradient(x-> begin
u0new, pnew = setter(prob, x)
sum(u0new) + sum(pnew)
end, x0)[1]