Ipopt._VectorNonlinearOracle: usage and future plans
Ipopt v1.8.0 adds the Ipopt._VectorNonlinearOracle set. The purpose of this issue is to explain our motivation for adding the set, and what we plan to do about it in the future.
Current status and future plans
Ipopt._VectorNonlinearOracle is experimental. You are encouraged to try it out and report back experiences (see usage instructions below). It is marked as experimental (it begins with an underscore) because we plan to make a change to the API no later than September 30, 2025 (six months after adding it).
We are considering three options:
- Rename the set to
Ipopt.VectorNonlinearFunctionand publicly support it as part of the v1.X API of Ipopt.jl - Move the set to MathOptInterface as part of the public API for MathOptInterface.jl, and add support for it in other packages like KNITRO.jl and NLopt.jl
- Remove the set in favor of a different approach to vector-valued user defined functions.
If the set is useful for a small number of users who use only Ipopt.jl, and no one shows wider interest, we will choose (1).
If the set is useful for a number of different groups, including by people who want to use it with different NLP solvers, and the API proves to be a good design, we will choose (2).
If issues arise, we reserve the right to delete the set entirely and pursue a different approach in (3).
Thus, if you use this set please comment below with feedback of how you are using it and whether we could improve it.
Compatibility
The _VectorNonlinearOracle feature is experimental. It relies on a private API feature of Ipopt.jl that will change in a future release. If you use this feature, you must pin the version of Ipopt.jl in your Project.toml to ensure that future updates to Ipopt.jl do not break your existing code.
A known good version of Ipopt.jl is v1.8.0. Pin the version using:
[compat]
Ipopt = "=1.8.0"
Definition
Ipopt._VectorNonlinearOracle is defined as:
_VectorNonlinearOracle(;
dimension::Int,
l::Vector{Float64},
u::Vector{Float64},
eval_f::Function,
jacobian_structure::Vector{Tuple{Int,Int}},
eval_jacobian::Function,
hessian_lagrangian_structure::Vector{Tuple{Int,Int}} = Tuple{Int,Int}[],
eval_hessian_lagrangian::Union{Nothing,Function} = nothing,
) <: MOI.AbstractVectorSet
which represents the set:
S = \{x \in \mathbb{R}^{dimension}: l \le f(x) \le u\}
where f is defined by the vectors l and u, and the callback oracles eval_f, eval_jacobian, and eval_hessian_lagrangian.
Function
The eval_f function must have the signature
eval_f(ret::AbstractVector, x::AbstractVector)::Nothing
which fills $f(x)$ into the dense vector ret.
Jacobian
The eval_jacobian function must have the signature
eval_jacobian(ret::AbstractVector, x::AbstractVector)::Nothing
which fills the sparse Jacobian $\nabla f(x)$ into ret.
The one-indexed sparsity structure must be provided in the jacobian_structure argument.
Hessian
The eval_hessian_lagrangian function is optional.
If eval_hessian_lagrangian === nothing, Ipopt will use a Hessian approximation instead of the exact Hessian.
If eval_hessian_lagrangian is a function, it must have the signature
eval_hessian_lagrangian(
ret::AbstractVector,
x::AbstractVector,
μ::AbstractVector,
)::Nothing
which fills the sparse Hessian of the Lagrangian $\sum \mu_i \nabla^2 f_i(x)$ into ret.
The one-indexed sparsity structure must be provided in the hessian_lagrangian_structure argument.
Use with JuMP
This section provides some usage examples for JuMP.
Example: f(x) <= 1
A simplest example is to represent a constraint x^2 + y^2 <= 1 in the problem:
\begin{aligned}
Max & x + y \\
& x^2 + y^2 \le 1
\end{aligned}
This can be achieved as follows
julia> using JuMP, Ipopt
julia> set = Ipopt._VectorNonlinearOracle(;
dimension = 2,
l = [-Inf],
u = [1.0],
eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2),
jacobian_structure = [(1, 1), (1, 2)],
eval_jacobian = (ret, x) -> ret .= 2.0 .* x,
hessian_lagrangian_structure = [(1, 1), (2, 2)],
eval_hessian_lagrangian = (ret, x, u) -> ret .= 2.0 .* u[1],
);
julia> begin
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, 0 <= x <= 1)
@variable(model, 0 <= y <= 1)
@objective(model, Max, x + y)
@constraint(model, [x, y] in set)
optimize!(model)
assert_is_solved_and_feasible(model)
value(x), value(y), 1 / sqrt(2)
end
(0.707106783471979, 0.707106783471979, 0.7071067811865475)
Example: y = F(x)
To model $y = F(x)$, we need to implement the set $[x, y] \in S$ such that $0 \le F(x) - y \le 0$
We can modify our previous example to:
\begin{aligned}
Max & x + y \\
& x^2 + y^2 - z = 0 \\
& z \le 1
\end{aligned}
which is implemented as follows:
julia> using JuMP, Ipopt
julia> set = Ipopt._VectorNonlinearOracle(;
dimension = 3,
l = [0.0],
u = [0.0],
eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2 - x[3]),
jacobian_structure = [(1, 1), (1, 2), (1, 3)],
eval_jacobian = (ret, x) -> begin
ret[1:2] .= 2.0 .* x[1:2]
ret[3] = -1.0
end,
hessian_lagrangian_structure = [(1, 1), (2, 2)],
eval_hessian_lagrangian = (ret, x, u) -> ret .= 2.0 .* u[1],
);
julia> begin
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, 0 <= x <= 1)
@variable(model, 0 <= y <= 1)
@variable(model, z <= 1)
@objective(model, Max, x + y)
# x^2 + y^2 - z == 0
@constraint(model, [x, y, z] in set)
optimize!(model)
assert_is_solved_and_feasible(model)
value(x), value(y), value(z), 1 / sqrt(2)
end
(0.707106783471979, 0.707106783471979, 1.0000000064625545, 0.7071067811865475)
Example: multiple rows
A more involved example is:
\begin{aligned}
& x^2 - a & = 0 \\
& y^2 + z^3 - b & = 0
\end{aligned}
which is implemented as follows:
julia> set = Ipopt._VectorNonlinearOracle(;
dimension = 5,
l = [0.0, 0.0],
u = [0.0, 0.0],
eval_f = (ret, x) -> begin
ret[1] = x[1]^2 - x[4]
ret[2] = x[2]^2 + x[3]^3 - x[5]
return
end,
jacobian_structure = [(1, 1), (2, 2), (2, 3), (1, 4), (2, 5)],
eval_jacobian = (ret, x) -> begin
ret[1] = 2 * x[1]
ret[2] = 2 * x[2]
ret[3] = 3 * x[3]^2
ret[4] = -1.0
ret[5] = -1.0
return
end,
hessian_lagrangian_structure = [(1, 1), (2, 2), (3, 3)],
eval_hessian_lagrangian = (ret, x, u) -> begin
ret[1] = 2 * u[1]
ret[2] = 2 * u[2]
ret[3] = 6 * x[3] * u[2]
return
end,
);
julia> begin
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i in 1:3] == i)
@variable(model, y[1:2])
@constraint(model, [x; y] in set)
optimize!(model)
assert_is_solved_and_feasible(model)
value.(x), value.(y)
end
([1.0, 2.0, 3.0], [1.0, 31.0])
Example: no Hessian
The Hessian evaluator is optional:
julia> using JuMP, Ipopt
julia> set = Ipopt._VectorNonlinearOracle(;
dimension = 2,
l = [-Inf],
u = [1.0],
eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2),
jacobian_structure = [(1, 1), (1, 2)],
eval_jacobian = (ret, x) -> ret .= 2.0 .* x,
);
julia> begin
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, 0 <= x <= 1)
@variable(model, 0 <= y <= 1)
@objective(model, Max, x + y)
@constraint(model, [x, y] in set)
optimize!(model)
assert_is_solved_and_feasible(model)
value(x), value(y), 1 / sqrt(2)
end
(0.7071067847123265, 0.7071067847123265, 0.7071067811865475)
Example: vector-valued user-defined function and ForwardDiff
Constructing the Jacobian and Hessian is tedious and error prone. Here is an example that uses ForwardDiff.jl:
julia> using JuMP, Ipopt, ForwardDiff
julia> function build_set_from_f(f; input_dimension, output_dimension)
input_indices = 1:input_dimension
output_indices = input_dimension.+(1:output_dimension)
return Ipopt._VectorNonlinearOracle(;
dimension = input_dimension + output_dimension,
l = zeros(output_dimension),
u = zeros(output_dimension),
# eval_f := f(x) - y
eval_f = (ret, args) -> begin
ret .= f(args[input_indices]) .- args[output_indices]
end,
# eval_jacobian := [∇f(x) -I]
jacobian_structure = vcat(
[(i, j) for j in 1:input_dimension for i in 1:output_dimension],
[(j, input_dimension + j) for j in 1:output_dimension],
),
eval_jacobian = (ret, args) -> begin
ret .= vcat(
vec(ForwardDiff.jacobian(f, args[input_indices])),
fill(-1.0, output_dimension),
)
end,
# eval_hessian_lagrangian := ∑ uᵢ ∇²fᵢ(x) := ∇²(u'f(x))
hessian_lagrangian_structure = [
(i, j) for j in 1:input_dimension for i in 1:input_dimension if j >= i
],
eval_hessian_lagrangian = (ret, args, u) -> begin
H = ForwardDiff.hessian(x -> u' * f(x), args[input_indices])
k = 0
for j in 1:input_dimension
for i in 1:j
k += 1
ret[k] = H[i, j]
end
end
end,
)
end
build_set_from_f (generic function with 1 method)
julia> f(x) = [x[1]^2, x[2]^2 + x[3]^3]
f (generic function with 1 method)
julia> set = build_set_from_f(f; input_dimension = 3, output_dimension = 2);
julia> begin
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i in 1:3] == i)
@variable(model, y[1:2])
@constraint(model, [x; y] in set)
optimize!(model)
assert_is_solved_and_feasible(model)
f(value.(x)), value.(y)
end
([1.0, 31.0], [1.0, 31.0])
Excited to test this new feature ASAP!
The eval_jacobian function must have the signature
eval_jacobian(ret::AbstractVector, x::AbstractVector)::Nothing
Did you meant ret::AbstractMatrix ? If yes, there is a similar mistake in the Hessian section.
Nope. ret is a vector, with the non-zeros in the same order as jacobian_structure and hessian_lagrangian_structure.
I have a have basic question: what's the meaning of oracle ? Same thing as a vector field ?
(P.S.: my background is electrical engineering, not operational research.)
Oracle meaning something you can ask a question of and get back an answer. We could have named this "VectorNonlinearCallback" but oracle is a fairly (?) standard term-of-art (if hard to search for because of a certain company). Nothing to do with fields.
(Okay, I've never actually thought of how someone might try to google this. "What is Oracle + Software?" Nope. "Oracle + query?" Nope.)
(Another edit: https://math.stackexchange.com/questions/2613109/what-is-the-purpose-of-an-oracle-in-optimization, https://en.wikipedia.org/wiki/Oracle_complexity_(optimization))
(Another-nother edit: turns out Oracle (the company) can solve NLPs with the "Oracle Decision Optimizer OptQuest" https://docs.oracle.com/cd/E57185_01/CBREG/ch07s02s01s04.html)
Alright thanks! First time I'm hearing it (I'm french also, so It may not help).
I tried to google the term and I failed miserably, hindered by this certain company haha.
Sometimes things go the other way. In the stochastic programming world, they use terms like "hazard-decision" to mean "observe the uncertainty first, then make a decision", but the term is meaningless in English, and thoroughly confused me until I worked out it was a transliteration of the French "hasard"
To add to @odow's explanation, the main advantage of this feature over add_nonlinear_operator is in the case where:
- You would like to use a vector-valued external function with second derivatives and
- you can compute the Hessian of the Lagrangian (the weighted sum of each output Hessian) more efficiently than the naive approach of evaluating each output Hessian individually. This can often be accomplished using adjoint methods.
Hey @odow, nice feature and syntax!
Is this new set part of the solution to use multithreaded callbacks in Knitro? (I'm thinking about https://github.com/jump-dev/KNITRO.jl/issues/93#issuecomment-1841653317)
I think no. There are many problems with a multithreaded callback that I don't see us resolving anytime soon.
So I still want to test the interface, but I was quite busy lately so still not had the time.
My main usage would be for ModelPredictiveControl.jl. It is based on JuMP.jl mainly because of the solver-independence feature. Because of this, my vote would go to option 2. or 3. above.
I have two question on the experimental functionalities:
- Why did you choose to store the Hessian as a vector + tuple of indices ? Why not using Julia's standard library
SparseArrays? It seems that it will clash withOffsetArrays.jl(but I agree that it's a low priority feature). - Are you aware of a simple way of converting a Hessian from a
SparseMatrixCSC{Tv,Ti}to a vector + tuple ofInts ? I'm asking the question because I useDifferentiationInterfaces.jl, and the sparse differentiation backends return aSparseMatrixCSC{Tv,Ti}.
Also @Robbybp, could you elaborate a bit more on "adjoint methods" ? I tried to find references on this in the context of the Hessian of a Lagrangian but I found nothing. Thanks!
Why did you choose to store the Hessian as a vector + tuple of indices
Because it was simple, and because that's the format that MOI requires.
It seems that it will clash with OffsetArrays.jl
Yes. We explicitly chose Base.Vector so that we don't need to worry about AbstractArray types and their various methods/MethodErrors. We will never support AbstractArrray here.
Are you aware of a simple way of converting a Hessian
julia> s = SparseArrays.sparse([1, 1, 2], [1, 2, 2], rand(3))
2×2 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
0.725471 0.556755
⋅ 0.601829
julia> I, J, V = SparseArrays.findnz(s)
([1, 1, 2], [1, 2, 2], [0.7254708507838116, 0.5567545415928516, 0.6018292600385067])
julia> indices = collect(zip(I, J))
3-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(1, 2)
(2, 2)
julia> V
3-element Vector{Float64}:
0.7254708507838116
0.5567545415928516
0.6018292600385067
because I use DifferentiationInterfaces.jl, and the sparse differentiation backends return
Symmetric? Non-symmetric?
could you elaborate a bit more on "adjoint methods" ?
See
https://github.com/lanl-ansi/MathOptAI.jl/blob/7c6b567bde1ac2923939193332d055cc0ea501ca/ext/MathOptAIIpoptPythonCallExt.jl#L86-L123
Okay, did some tests without exact Hessians, since this what I do right now in MPC.jl (using the memoization trick of JuMP tutorial). It's a fairer comparison.
- First, good work! I did not noticed any major bug in the implementation.
- If our objective function is defined with user-defined nonlinear operator, we are still limited to the splatting syntax? I guess the impact on the performance here is way lower than for nonlinear constraints...
- The way
_VectorNonlinearOracleset are printed in the REPL with the temp names of the anonymous functions:
is simple enough, but is problematic when the functions are defined in closures. It is common to see them in stacktraces. In my use case, it prints > 10000 lines! The max default of VS Codes is 10000 lines, so if there is an exception, everything becomes unreadable. I guess you will add pretty-print later?julia> set Ipopt._VectorNonlinearOracle(3, 1, [0.0], [0.0], var"#1#4"(), [(1, 1), (1, 2), (1, 3)], var"#2#5"(), [(1, 1), (2, 2)], var"#3#6"(), [0.0, 0.0, 0.0]) - This is preliminary and my code is super dirty right now, but the performance gains compared to old implementation with splatting+memoization is already there! Look for yourself here (the CI is still running right now), or locally on my computer:
| main | dirty | main / dirty | |
|---|---|---|---|
| CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Custom constraints/Ipopt/MultipleShooting | 1.03 ± 0.0057 s | 0.378 ± 0.029 s | 2.72 ± 0.21 |
| CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Custom constraints/Ipopt/SingleShooting | 0.841 ± 0.01 s | 0.792 ± 0.019 s | 1.06 ± 0.029 |
| CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Economic/Ipopt/MultipleShooting | 0.499 ± 0.0049 s | 0.189 ± 0.029 s | 2.65 ± 0.41 |
| CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Noneconomic/Ipopt/MultipleShooting | 0.465 ± 0.013 s | 0.176 ± 0.026 s | 2.64 ± 0.4 |
It's about 2.5 faster on average. The more the number of nonlinear constraints there is, the higher the speed gain is (see e.g. custom constraints with multiple shooting).
So overall, this set is a huge improvement IMO, not only for efficient Hessian handling, but straight-up for efficient nonlinear constraints, with or without exact Hessians. It should be available for all NLP sovers in the MOI ecosystem.
If our objective function is defined with user-defined nonlinear operator, we are still limited to the splatting syntax?
Yes. You could do an epigraph reformulation though with min t; t >= f(x)
I guess you will add pretty-print later
Yes, we can do that
Yes. You could do an epigraph reformulation though with
min t; t >= f(x)
I just learned something! My guessed would have been an equality constraint, but it is really an inequality here, to preserve convexity, from what I read. I'm not sure it would worth it tho. I would loose on code readability, and one single splatted function for the objective is presumably fast enough, compared to multiple functions for the nonlinear constraints.
Would you be interested in benchmarks including exact Hessian (with sparse ForwardDiff based on DI.jl) ?
Else, I will work on other new features in the meantime, before the official public release of the set.
Well. It's the 30th of September. So I guess we need to make a decision. I'm personally in favour of adding it as a set to MOI.
For my research, this is a win with the use of PyTorch surrogates via MathOptAI and other external functions in optimal control problems.
@odow I am working on the interface of MOI for Uno.jl and I added the Jacobian-vector / Hessian-vector operators in the Oracle.
If you merge it in MOI.jl, it will be great to also have support for the operators.
To the best of my knowledge, we don't need them in MadNLP.jl / Ipopt.jl but it is supported in Uno.jl / BQPD.
https://github.com/cvanaret/Uno/pull/310
Here's my proposal to add it to MOI: https://github.com/jump-dev/MathOptInterface.jl/pull/2860
I think it is a relevant set for the nonlinear solvers!