ModelingToolkit.jl Link/Provide full tested tutorial for Optimization in FAQ

Is your feature request related to a problem? Please describe.

I find it hard to translate the help given in FAQ on Optimization/AD into code that actually performs a parameter optimization.

Other also encountered difficulties with Optimization:

#2979
#2606
- (The currently proposed solution there also uses setp with the associated problems with Zygote)

Describe the solution you’d like

Provide a short tutorial of an executable code, that actually optimizes a subset of parameters and initial conditions of an ODEProblem derived from a MTK system.
Implement tests for this problem that use different Optimizers and AD backends (should include ForwardDiff and Zygote)

Describe alternatives you’ve considered

Additional context

I propose a modification of the SciMLSensitivity example .

I provide the following code as a start for the tutorial. It tries 3 different approaches of updating the Problem, but encounters several obstacles. The third alternative works, but only using ForwardDiff.jl and is rather complicated.

The example uses the standard quite simple Lotka-Volterra problem, but simulates some complexity by using an non-scalar parameter, px[1:2] = [α, β].

import Pkg; 
Pkg.activate(;temp=true)
Pkg.add(["OrdinaryDiffEq","Optimization","OptimizationPolyalgorithms","SciMLSensitivity",
  "Zygote","ForwardDiff","ModelingToolkit","SymbolicIndexingInterface",
  "SciMLStructures","SciMLBase","ComponentArrays","Plots"])
import OrdinaryDiffEq as ODE
import Optimization as OPT
import OptimizationPolyalgorithms as OPA
import SciMLSensitivity as SMS
import Zygote
import ForwardDiff
using SciMLBase
using ComponentArrays
#@usingany Plots: Plots
#using Plots: Plots

using ModelingToolkit, OrdinaryDiffEq
using ModelingToolkit: t_nounits as t, D_nounits as D, ModelingToolkit as MTK
using SymbolicIndexingInterface: SymbolicIndexingInterface as SII
using SciMLStructures: SciMLStructures as SS

@variables x(t) y(t) z(t)
@parameters px[1:2]=[1.5, 1.0] γ=3.0 δ=1.0

eqs = [D(x) ~ px[1] * x - px[2] * x * y
       D(y) ~ -γ * y + δ * x * y
       z ~ x + y]

@named sys = System(eqs, t)
simpsys = mtkcompile(sys)
tsteps = 0.0:0.1:10.0
tspan = extrema(tsteps)
prob = ODEProblem(simpsys, [x => 1.1, y => 1.2], tspan)
sol = sol_true = ODE.solve(prob, ODE.Tsit5(), saveat = tsteps)
#Plots.plot(sol)

paropt = [px, γ]
paropt_sym = Symbol.(paropt)
n_paropt = sum(length.(paropt))
stateopt = [x]
stateopt_sym = Symbol.(stateopt)
#p_true = vcat(prob.ps[paropt]..., prob[stateopt]...) 
p_true = vcat(
    ComponentVector(;zip(paropt_sym,prob.ps[paropt])...),
    ComponentVector(;zip(stateopt_sym,prob[stateopt])...))
p0 = p = p_true .+ randn(length(p_true)) .* 0.2
probo = remake(prob) # copy
nest_structure(p, syms) = [p[k] for k in syms]
parstateopt = vcat(paropt, stateopt)
parstateopt_sym = Symbol.(parstateopt)
p0n = nest_structure(p0, parstateopt_sym)

#--------------- recommended method for optimization at MTK.FAQ
# obstacle: do not find documentation for initial conditions
# obstacle: problems with both Zygote and ForwardDiff
setter! = SII.setp(simpsys, paropt)
setter!(probo, nest_structure(p0, paropt_sym)) 
function loss(p)
    local p_struc = nest_structure(p, paropt_sym) #  omit u0
    setter!(probo, p_struc)
    local sol = ODE.solve(probo, ODE.Tsit5(), saveat = tsteps)
    local loss = sum(abs2, sol .- sol_true)
    return loss
end
loss(p0)
#Zygote.gradient(loss, p0)
#ForwardDiff.gradient(loss, p0)


#------------- alternative using indices, 
# documentation for initial values
# obstacle: problems with both Zygote and ForwardDiff 
probo = remake(prob)
ps_ind = MTK.parameter_index.(Ref(simpsys), paropt)
setindex!.(Ref(probo.ps), nest_structure(p0, paropt_sym), ps_ind)
probo.ps[px] == p0[Symbol("px[1:2]")]
x_ind = MTK.variable_index.(Ref(simpsys), stateopt) # 2
#setindex!(probo.ps, [p0[3]], x_ind)

function loss(p)
    setindex!.(Ref(probo.ps), nest_structure(p, paropt_sym), ps_ind)
    # for simplicity, start omitting u0
    local xv = probo.u0
    # local xvn = [begin
    #     ii = findfirst(==(i), x_ind)
    #     isnothing(ii) ? xv[i] : p[length(ps_ind) + ii]
    # end for i in  axes(xv,1)]
    #local xvn = vcat(xv[1], p[3])
    #probox = remake(probo, u0 = xvn) 
    local sol = ODE.solve(probo, ODE.Tsit5(), saveat = tsteps)
    local loss = sum(abs2, sol .- sol_true)
    return loss
end
loss(p0)
Zygote.gradient(loss, p0) # nothing?
#ForwardDiff.gradient(loss, p0)


#--------------- alternative: SciMLStructures
# works with ForwardDiff
# obstacle: how to infere positions of optimized parameters in canicalized buffer?
# obstacle: error or zero Zygote.gradient for initial conditions ?
probo = remake(prob)
function loss(p)
    local sol = compute_sol(p, probo)
    local loss = SciMLBase.successful_retcode(sol) ? sum(abs2, sol .- sol_true) : 1e30
    return loss
end
function compute_sol(p, probo) # variant without Zygote compile error, but not general
    local pv = SII.parameter_values(probo)
    local buf, _, _ = SS.canonicalize(SS.Tunable(), pv)
    local bufx, _, _ = SS.canonicalize(SS.Initials(), pv)
    local bufn = vcat(p[1:2], buf[3], p[3])       # TODO describe general
    local pvn = SS.replace(SS.Tunable(), pv, bufn)
    local bufxn = vcat(bufx[1], p[4], bufx[3:end]) # TODO describe general
    local pvn2 = SS.replace(SS.Initials(), pvn, bufxn)
    #local probon = remake(probo; u0 = xvn, p = pvn) # u0 set to pvn.initials instead
    local probon = remake(probo; p = pvn2) 
    local probon2 = probon
    #local xvn = vcat(xv[1], p[3])
    #local probon2 = remake(probon; u0 = xvn) # mutating?
    #@show probon2.u0, probon2.p
    local sol = ODE.solve(probon2, ODE.Tsit5(), saveat = tsteps)
    return sol
end
psetter! = SII.setp(probo, paropt)
ssetter! = SII.setu(probo, stateopt)
function compute_sol(p, probo)
    local pv = SII.parameter_values(probo)
    local buf, _, _ = SS.canonicalize(SS.Tunable(), pv)
    local bufx, _, _ = SS.canonicalize(SS.Initials(), pv)
    # for ForwardDiff need to convert the eltype of the entire portions
    ET = eltype(p)
    local pvt = pv
    pvt = eltype(buf) == ET ? pvt : SS.replace(SS.Tunable(), pvt, ET.(buf)) 
    pvt = eltype(bufx) == ET ? pvt : SS.replace(SS.Initials(), pvt, ET.(bufx)) 
    #local psetter! = SII.setp(probo, paropt)
    local p_struc = nest_structure(p, paropt_sym) 
    psetter!(pvt, p_struc)
    #local ssetter! = SII.setu(probo, stateopt)
    local s_struc = nest_structure(p, stateopt_sym) 
    #ssetter!(pvt, s_struc)   # state_values(MTKParameters) not implemented
    local probon2 = remake(probo, p = pvt)
    ssetter!(probon2, s_struc)
    #local sol = ODE.solve(probon2, ODE.Tsit5(), saveat = tsteps)
    # need another remake to update probon2.p.initial to probon2.u0
    local probon3 = remake(probon2, u0=probon2.u0)
    #@show probon3.u0, probon3.p
    local sol = ODE.solve(probon3, ODE.Tsit5(), saveat = tsteps)
    return sol
end
#include("tmp/test.jl")
loss(p0)
loss(p)
loss(p_true)
#Zygote.gradient(loss, p0)
ForwardDiff.gradient(loss, p0)

callback = function (state, l)
    display(l)
    # p = state.u
    # sol = compute_sol(p)
    # plt = Plots.plot(sol, ylim = (0, 7))
    # display(plt)
    # Tell Optimization.solve to not halt the optimization. If return true, then
    # optimization stops.
    return false
end

adtype = OPT.AutoForwardDiff()
#adtype = OPT.AutoZygote()
optf = OPT.OptimizationFunction((x, p) -> loss(x), adtype)
optprob = OPT.OptimizationProblem(optf, p0)

opt_alg = OPA.PolyOpt()
#opt_alg = OPA.LBFGS()
result_ode = OPT.solve(optprob, opt_alg,
    callback = callback,
    maxiters = 20,
    )

result_ode.stats
p = result_ode.u
hcat(p0, p, p_true)
loss(result_ode.u)

Oct 02 '25 08:10 bgctw

Did you see https://docs.sciml.ai/ModelingToolkit/dev/examples/remake/ ? The details are a bit baroque, but it does provide a parameter optimization example.

Oct 03 '25 01:10 cstjean

Thanks @cstjean. This is very similar to what i discovered with the third alternative and comes close of what I asked for. I created a pull request to the documentation: #3955.

Oct 06 '25 09:10 bgctw

Is there a list of which AD backends for optimizing MKTSystem parameters are supported/tested?

Background: I am working on hybrid problems (combining process based modeling with machine learning), where the parameters of the ODEProblem are only a small subset of a much larger parameter vector, that includes ML-weights. For these kinds of problems, reverse AD is much more efficient. Currently, I use Zygote - which unfortunately does not work with setp from SymbolicIndexingInterface.

Oct 06 '25 09:10 bgctw

It's just the same list as SciMLSensitivity

Oct 06 '25 09:10 ChrisRackauckas

Currently, I use Zygote - which unfortunately does not work with setp from SymbolicIndexingInterface.'

It should? There's tests with it. You're building the function outside of the differentiation context right?

Oct 06 '25 09:10 ChrisRackauckas

A call to Zygote.gradient on the loss function reports a mutation error. Did I confuse something? Should I write a bug report? The error:

julia> Zygote.gradient(x -> loss(x, parg), p0)[1]
ERROR: Mutating arrays is not supported -- called setindex!(Vector{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{Float64})
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:70
  [3] (::Zygote.var"#553#554"{Vector{Float64}})(::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:82
  [4] (::Zygote.var"#2643#back#555"{Zygote.var"#553#554"{Vector{Float64}}})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
  [5] set_parameter!
    @ ~/scratch/twutz/julia_cluster_depots/packages/ModelingToolkit/bdYND/src/systems/parameter_buffer.jl:398 [inlined]
  [6] (::Zygote.Pullback{Tuple{typeof(set_parameter!), MTKParameters{Vector{…}, Vector{…}, Tuple{}, Tuple{}, Tuple{}, Tuple{}}, Float64, ModelingToolkit.ParameterIndex{Tunable, Int64}}, Any})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [7] SetParameterIndex
    @ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/parameter_indexing.jl:699 [inlined]
  [8] (::Zygote.Pullback{Tuple{SymbolicIndexingInterface.SetParameterIndex{…}, MTKParameters{…}, Float64}, Tuple{Zygote.var"#2200#back#317"{…}, Zygote.Pullback{…}}})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [9] MultipleSetters
    @ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/parameter_indexing.jl:717 [inlined]

The slightly modified Basic example that raises the above error:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using OrdinaryDiffEq

using SymbolicIndexingInterface: parameter_values, state_values
using SciMLStructures: Tunable, canonicalize, replace, replace!
using PreallocationTools

using Optimization
#using OptimizationOptimJL  # BFGS()
import OptimizationPolyalgorithms as OPA
import SciMLSensitivity

import ForwardDiff
import Zygote

@parameters α β γ δ
@variables x(t) y(t)
eqs = [D(x) ~ (α - β * y) * x
       D(y) ~ (δ * x - γ) * y]
@mtkcompile odesys = System(eqs, t)


odeprob = ODEProblem(
    odesys, [x => 1.0, y => 1.0, α => 1.5, β => 1.0, γ => 3.0, δ => 1.0], (0.0, 10.0))
timesteps = 0.0:0.1:10.0
sol = solve(odeprob, Tsit5(); saveat = timesteps)
data = Array(sol)
# add some random noise
data = data + 0.01 * randn(size(data))

paropt = [α, β, γ, δ]
ptrue = odeprob.ps[paropt]
p0 =  ptrue + 0.2*randn(length(paropt))

function loss(x, p)
    odeprob = p[1] # ODEProblem stored as parameters to avoid using global variables
    ps = parameter_values(odeprob) # obtain the parameter object from the problem
    diffcache = p[5]
    # get an appropriately typed preallocated buffer to store the `x` values in
    buffer = get_tmp(diffcache, x)
    # copy the current values to this buffer
    copyto!(buffer, canonicalize(Tunable(), ps)[1])
    # create a copy of the parameter object with the buffer
    ps = replace(Tunable(), ps, buffer)
    # set the updated values in the parameter object
    setter = p[4]
    setter(ps, x)
    # remake the problem, passing in our new parameter object
    newprob = remake(odeprob; p = ps)
    timesteps = p[2]
    sol = solve(newprob, AutoTsit5(Rosenbrock23()); saveat = timesteps)
    truth = p[3]
    data = Array(sol)
    return sum((truth .- data) .^ 2) / length(truth)
end


# manually create an OptimizationFunction to ensure usage of `ForwardDiff`, which will
# require changing the types of parameters from `Float64` to `ForwardDiff.Dual`
adtype = Optimization.AutoForwardDiff()
adtype = Optimization.AutoZygote()
optfn = OptimizationFunction(loss, adtype)
# function to set the parameters we are optimizing
setter = setp(odeprob, paropt)
# `DiffCache` to avoid allocations.
# `copy` prevents the buffer stored by `DiffCache` from aliasing the one in
# `parameter_values(odeprob)`.
diffcache = DiffCache(copy(canonicalize(Tunable(), parameter_values(odeprob))[1]))


parg = (odeprob, timesteps, data, setter, diffcache)
loss(p0, parg)
ForwardDiff.gradient(x -> loss(x, parg), p0)
Zygote.gradient(x -> loss(x, parg), p0)[1]


# parameter object is a tuple, to store differently typed objects together
optprob = OptimizationProblem(
    optfn, p0, parg,
    lb = 0.1zeros(4), ub = 5ones(4))
opt_alg = OPA.PolyOpt()    
opt_alg = OPA.LBFGS()
sol = solve(optprob, opt_alg)

hcat(p0, sol.u, ptrue)

Oct 06 '25 09:10 bgctw

You have to use the out of place version, setsym_oop if it's Zygote. That's because Zygote doesn't support mutation, so every operation has to be out of place.

Oct 06 '25 09:10 ChrisRackauckas

Thanks @ChrisRackauckas. This advice of replacing setp by setsym_oop might by the hint I am looking for since a long time.

I found documentation on setsym_oop. Is there an example usage?

When i try to experiment with it using the context above I get the error:

julia> Zygote.gradient(x -> loss(x, parg), p0)[1]
ERROR: Mutating arrays is not supported -- called push!(Vector{Int64}, ...)
...
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{Int64})
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:70
  [3] (::Zygote.var"#561#562"{Vector{Int64}})(::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/lib/array.jl:89
  [4] (::Zygote.var"#2663#back#563"{Zygote.var"#561#562"{Vector{Int64}}})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
  [5] _setsym_oop
    @ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/state_indexing.jl:502 [inlined]
  [6] (::Zygote.Pullback{Tuple{typeof(SymbolicIndexingInterface._setsym_oop), ODEProblem{…}, NotSymbolic, ScalarSymbolic, Vector{…}}, Any})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [7] setsym_oop
    @ ~/scratch/twutz/julia_cluster_depots/packages/SymbolicIndexingInterface/mg9eW/src/state_indexing.jl:437 [inlined]
  [8] (::Zygote.Pullback{Tuple{typeof(setsym_oop), ODEProblem{…}, Vector{…}}, Tuple{Zygote.Pullback{…}, Zygote.Pullback{…}, Zygote.Pullback{…}, Zygote.Pullback{…}}})(Δ::Nothing)
    @ Zygote ~/scratch/twutz/julia_cluster_depots/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [9] loss
    @ /Net/Groups/BGI/people/twutz/julia/dev/MTKHelpers/tmp/test.jl:3 [inlined]

With the code:

setter_oop = setsym_oop(odeprob, paropt)
parg_oop = (odeprob, timesteps, data, setter_oop)

function loss(x, p)
    local odeprob = p[1] # ODEProblem stored as parameters to avoid using global variables
    local setter_oop = p[4]
    (u0new, psnew) = setter_oop(odeprob, x)
    newprob = remake(odeprob, u0=u0new, p=psnew)
    timesteps = p[2]
    local sol = solve(newprob, AutoTsit5(Rosenbrock23()); saveat = timesteps)
    local truth = p[3]
    data = Array(sol)
    return sum((truth .- data) .^ 2) / length(truth)
end

Zygote.gradient(x -> loss(x, parg_oop), p0)[1]

The line that causes the error is the line in the cost function:

(u0new, psnew) = setter_oop(odeprob, x)

Maybe, this discussion should go better to a new issue.

Oct 06 '25 10:10 bgctw

https://github.com/SciML/SciMLBase.jl/blob/933c2802fb435c5343ec8a8c045cbcf737ecd4ea/test/downstream/remake_autodiff.jl#L21 covers this

Oct 06 '25 10:10 ChrisRackauckas

In the linked test, I see the application of the setter only outside function called by Zygote gradient. To my understanding, however, the cost function needs to differentiate the setter.

The mutation error occurs for the following code after the linked test:

x0 = [1.0, 1.0, 1.5, 1.0, 1.0, 1.0]
Zygote.gradient(x-> begin
    u0new, pnew = setter(prob, x)
    sum(u0new) + sum(pnew)
end, x0)[1]

Oct 06 '25 10:10 bgctw