Type instability in ADNLPModels
Hello,
I am trying to implement an efficient optimization solver. In order to test performance I use ADNLPModels models from OptimizationProblems.jl, I am currently trying to catch type instabilities/allocations using JET.jl and I have found that a lot of issues where coming from this package.
This is weird because even for very simple operations Jet throws type instabilities at me, I guess I am doing something wrong but I can not find my error. I am not sure whether this belongs here or if I should post it on OptimizationProblems.jl.
Anyway, here is a minimal working example:
using JET, ADNLPModels, NLPModels, OptimizationProblems, OptimizationProblems.ADNLPProblems
nlp = bt1()
x = zeros(2)
@report_opt obj(nlp, x)
which results in
═════ 1 possible error found ═════
┌ obj(nlp::ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, x::Vector{Float64}) @ ADNLPModels C:\Users\mgoll\.julia\packages\ADNLPModels\ELm6c\src\nlp.jl:538
│ runtime dispatch detected: %22::Any(x::Vector{Float64})::Any
Additionnally, this causes allocations as
@allocated obj(nlp, x)
@allocated obj(nlp, x)
results in 16 which is quite small but I might have to call this function and other ADNLPModels related functions very often.
Let me know what you think, thank you !
Mmmhhh, it is possible. Can you try adding a parametric type for the function f here:
mutable struct ADNLPModel{T, S, Si} <: AbstractADNLPModel{T, S}
meta::NLPModelMeta{T, S}
counters::Counters
adbackend::ADModelBackend
# Functions
f
clinrows::Si
clincols::Si
clinvals::S
c!
end
?
I sometimes test like this:
nlp = OptimizationProblems.ADNLPProblems.bt1()
function check_obj_allocs(nlp)
x = zeros(2)
obj(nlp, x)
return nothing
end
@allocated check_obj_allocs(nlp)
@allocated check_obj_allocs(nlp)
if that works we should add an allocation test. I thought it existed for residuals and constraints functions but can't find it in the tests, so maybe I was doing it opportunistically.
@tmigot, adding the parametric types did indeed solve the runtime dispatch error from JET.
I will open a PR as well as add unit tests.
cc. @dpo @amontoison
TLDR; There is a type instability with the adbackend of the struct ADNLPModels, removing it produces a substantial speedup for most functions associated with it but also means that we can no longer change the backend type once the struct has been initialized.
There are still lots of type instabilities here and there. For instance,
using JET, ADNLPModels, NLPModels, OptimizationProblems, OptimizationProblems.ADNLPProblems
nlp = bt1()
x = zeros(2)
g = zeros(2)
@report_opt grad!(nlp, x, g)
═════ 1 possible error found ═════
┌ grad!(nlp::ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, x::Vector{Float64}, g::Vector{Float64}) @ ADNLPModels C:\Users\mgoll\ADNLPModels.jl\src\nlp.jl:544
│ runtime dispatch detected: ADNLPModels.gradient!(%39::Any, g::Vector{Float64}, %40::Any, x::Vector{Float64})::Any
To fix this, we should specify the type of adbackend field in the ADNLPModel struct as a parametric type.
Currently, the struct is defined as
https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/blob/a16ccc3c67da86a9e73aeceb9f9a2aa92a26d554/src/nlp.jl#L3-L16
I suggest changing this to
mutable struct ADNLPModel{T, S, Si, F1, F2, ADMB <: ADModelBackend} <: AbstractADNLPModel{T, S}
meta::NLPModelMeta{T, S}
counters::Counters
adbackend::ADMB
# Functions
f::F1
clinrows::Si
clincols::Si
clinvals::S
c!::F2
end
Of course, with this modification it will be now no longer possible to modify the type of adbackend once the ADNLPModel has been constructed.
In particular, this breaks
https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/blob/a16ccc3c67da86a9e73aeceb9f9a2aa92a26d554/src/ADNLPModels.jl#L253-L255
when typeof(new_adbackend) != typeof(nlp.adbackend) which is the point of this method.
On the other hand, we gain a significant speedup, for instance with the older version,
using ADNLPModels, NLPModels, OptimizationProblems, OptimizationProblems.ADNLPProblems, BenchmarkTools
nlp = bt1()
x = zeros(2)
g = zeros(2)
@btime grad!(nlp, x, g)
173.853 ns (3 allocations: 112 bytes)
2-element Vector{Float64}:
-1.0
0.0
while my version gives
using ADNLPModels, NLPModels, OptimizationProblems, OptimizationProblems.ADNLPProblems, BenchmarkTools
nlp = bt1()
x = zeros(2)
g = zeros(2)
@btime grad!(nlp, x, g)
34.573 ns (0 allocations: 0 bytes)
2-element Vector{Float64}:
-1.0
0.0
which is a x5 speedup in this case. I believe this can not be ignored.
I am wondering if we should not set the structure ADNLPModel not mutable in that case.
I am fine with your modifications Maxence, we just need to document that set_adbackend! returns a new ADNLPModel.
@MaxenceGollier I didn’t notice this issue three weeks ago, sorry about that.
It’s a major issue to fix, so feel free to tag me if you need a review.
We should rename set_backend! to set_backend if we're no longer updating the old nlp.
I don't like having set_adbackend! make a new ADNLPModel, it's too much implicit.
I think it would still works if you pass a backend of the same type, so let it be like this, no?
There was some use cases of solving minor modifications of problems without re-allocating the whole ADNLPModel, so that would still be useful. It won't work with the most useful backend, but with the most versatile ones.
Agree with what you propose @MaxenceGollier , I don't think there is a way to be more type stable and switch backend easily between the predefined one. I think if somebody wants to switch backend, they will have to design a tailored one.
@MaxenceGollier could you also evaluate the impact of having it non-mutable? You can get inspiration from https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/tree/main/benchmark to test it on more and larger problems.
@tmigot when we recreate a new ADNLPModel, we can recycle the content of the previous one. It doesn't mean that we reallocate everything from scratch. We just need a new instance of ADNLPModels for type stability.
Yes, but if you recreate a new ADNLPModel you should use a constructor for that. Using set_adbackend would be too much implicit machinery.
It is fine for me to have a new ADNLPModel after a call to set_adbackend. It is not so much implicit for me.
@MaxenceGollier what is your planning for this? If you are not planning to work on this in the following month, maybe @amontoison or myself will start something. Do you have some code already to make a draft PR so we can help?
I have some code yes, i will open a draft PR.
The thing is that my resulting benchmarks are not really satisfactory on some instances, I forgot to keep you informed sorry. I will share then benchmarks on the PR.