Ipopt.jl icon indicating copy to clipboard operation
Ipopt.jl copied to clipboard

Ipopt._VectorNonlinearOracle: usage and future plans

Open odow opened this issue 9 months ago • 9 comments

Ipopt v1.8.0 adds the Ipopt._VectorNonlinearOracle set. The purpose of this issue is to explain our motivation for adding the set, and what we plan to do about it in the future.

Current status and future plans

Ipopt._VectorNonlinearOracle is experimental. You are encouraged to try it out and report back experiences (see usage instructions below). It is marked as experimental (it begins with an underscore) because we plan to make a change to the API no later than September 30, 2025 (six months after adding it).

We are considering three options:

  1. Rename the set to Ipopt.VectorNonlinearFunction and publicly support it as part of the v1.X API of Ipopt.jl
  2. Move the set to MathOptInterface as part of the public API for MathOptInterface.jl, and add support for it in other packages like KNITRO.jl and NLopt.jl
  3. Remove the set in favor of a different approach to vector-valued user defined functions.

If the set is useful for a small number of users who use only Ipopt.jl, and no one shows wider interest, we will choose (1).

If the set is useful for a number of different groups, including by people who want to use it with different NLP solvers, and the API proves to be a good design, we will choose (2).

If issues arise, we reserve the right to delete the set entirely and pursue a different approach in (3).

Thus, if you use this set please comment below with feedback of how you are using it and whether we could improve it.

Compatibility

The _VectorNonlinearOracle feature is experimental. It relies on a private API feature of Ipopt.jl that will change in a future release. If you use this feature, you must pin the version of Ipopt.jl in your Project.toml to ensure that future updates to Ipopt.jl do not break your existing code.

A known good version of Ipopt.jl is v1.8.0. Pin the version using:

[compat]
Ipopt = "=1.8.0"

Definition

Ipopt._VectorNonlinearOracle is defined as:

_VectorNonlinearOracle(;
    dimension::Int,
    l::Vector{Float64},
    u::Vector{Float64},
    eval_f::Function,
    jacobian_structure::Vector{Tuple{Int,Int}},
    eval_jacobian::Function,
    hessian_lagrangian_structure::Vector{Tuple{Int,Int}} = Tuple{Int,Int}[],
    eval_hessian_lagrangian::Union{Nothing,Function} = nothing,
) <: MOI.AbstractVectorSet

which represents the set:

S = \{x \in \mathbb{R}^{dimension}: l \le f(x) \le u\}

where f is defined by the vectors l and u, and the callback oracles eval_f, eval_jacobian, and eval_hessian_lagrangian.

Function

The eval_f function must have the signature

eval_f(ret::AbstractVector, x::AbstractVector)::Nothing

which fills $f(x)$ into the dense vector ret.

Jacobian

The eval_jacobian function must have the signature

eval_jacobian(ret::AbstractVector, x::AbstractVector)::Nothing

which fills the sparse Jacobian $\nabla f(x)$ into ret.

The one-indexed sparsity structure must be provided in the jacobian_structure argument.

Hessian

The eval_hessian_lagrangian function is optional.

If eval_hessian_lagrangian === nothing, Ipopt will use a Hessian approximation instead of the exact Hessian.

If eval_hessian_lagrangian is a function, it must have the signature

eval_hessian_lagrangian(
    ret::AbstractVector,
    x::AbstractVector,
    μ::AbstractVector,
)::Nothing

which fills the sparse Hessian of the Lagrangian $\sum \mu_i \nabla^2 f_i(x)$ into ret.

The one-indexed sparsity structure must be provided in the hessian_lagrangian_structure argument.

Use with JuMP

This section provides some usage examples for JuMP.

Example: f(x) <= 1

A simplest example is to represent a constraint x^2 + y^2 <= 1 in the problem:

\begin{aligned}
Max & x + y \\
 & x^2 + y^2 \le 1
\end{aligned}

This can be achieved as follows

julia> using JuMP, Ipopt

julia> set = Ipopt._VectorNonlinearOracle(;
           dimension = 2,
           l = [-Inf],
           u = [1.0],
           eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2),
           jacobian_structure = [(1, 1), (1, 2)],
           eval_jacobian = (ret, x) -> ret .= 2.0 .* x,
           hessian_lagrangian_structure = [(1, 1), (2, 2)],
           eval_hessian_lagrangian = (ret, x, u) -> ret .= 2.0 .* u[1],
       );

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, 0 <= x <= 1)
           @variable(model, 0 <= y <= 1)
           @objective(model, Max, x + y)
           @constraint(model, [x, y] in set)
           optimize!(model)
           assert_is_solved_and_feasible(model)
           value(x), value(y), 1 / sqrt(2)
       end
(0.707106783471979, 0.707106783471979, 0.7071067811865475)

Example: y = F(x)

To model $y = F(x)$, we need to implement the set $[x, y] \in S$ such that $0 \le F(x) - y \le 0$

We can modify our previous example to:

\begin{aligned}
Max & x + y \\
 & x^2 + y^2 - z = 0 \\
 & z \le 1
\end{aligned}

which is implemented as follows:

julia> using JuMP, Ipopt

julia> set = Ipopt._VectorNonlinearOracle(;
           dimension = 3,
           l = [0.0],
           u = [0.0],
           eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2 - x[3]),
           jacobian_structure = [(1, 1), (1, 2), (1, 3)],
           eval_jacobian = (ret, x) -> begin
               ret[1:2] .= 2.0 .* x[1:2]
               ret[3] = -1.0
           end,
           hessian_lagrangian_structure = [(1, 1), (2, 2)],
           eval_hessian_lagrangian = (ret, x, u) -> ret .= 2.0 .* u[1],
       );

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, 0 <= x <= 1)
           @variable(model, 0 <= y <= 1)
           @variable(model, z <= 1)
           @objective(model, Max, x + y)
           # x^2 + y^2 - z == 0
           @constraint(model, [x, y, z] in set)
           optimize!(model)
           assert_is_solved_and_feasible(model)
           value(x), value(y), value(z), 1 / sqrt(2)
       end
(0.707106783471979, 0.707106783471979, 1.0000000064625545, 0.7071067811865475)

Example: multiple rows

A more involved example is:

\begin{aligned}
 & x^2 - a & = 0 \\
 & y^2 + z^3 - b & = 0
\end{aligned}

which is implemented as follows:

julia> set = Ipopt._VectorNonlinearOracle(;
           dimension = 5,
           l = [0.0, 0.0],
           u = [0.0, 0.0],
           eval_f = (ret, x) -> begin
               ret[1] = x[1]^2 - x[4]
               ret[2] = x[2]^2 + x[3]^3 - x[5]
               return
           end,
           jacobian_structure = [(1, 1), (2, 2), (2, 3), (1, 4), (2, 5)],
           eval_jacobian = (ret, x) -> begin
               ret[1] = 2 * x[1]
               ret[2] = 2 * x[2]
               ret[3] = 3 * x[3]^2
               ret[4] = -1.0
               ret[5] = -1.0
               return
           end,
           hessian_lagrangian_structure = [(1, 1), (2, 2), (3, 3)],
           eval_hessian_lagrangian = (ret, x, u) -> begin
               ret[1] = 2 * u[1]
               ret[2] = 2 * u[2]
               ret[3] = 6 * x[3] * u[2]
               return
           end,
       );

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x[i in 1:3] == i)
           @variable(model, y[1:2])
           @constraint(model, [x; y] in set)
           optimize!(model)
           assert_is_solved_and_feasible(model)
           value.(x), value.(y)
       end
([1.0, 2.0, 3.0], [1.0, 31.0])

Example: no Hessian

The Hessian evaluator is optional:

julia> using JuMP, Ipopt

julia> set = Ipopt._VectorNonlinearOracle(;
           dimension = 2,
           l = [-Inf],
           u = [1.0],
           eval_f = (ret, x) -> (ret[1] = x[1]^2 + x[2]^2),
           jacobian_structure = [(1, 1), (1, 2)],
           eval_jacobian = (ret, x) -> ret .= 2.0 .* x,
       );

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, 0 <= x <= 1)
           @variable(model, 0 <= y <= 1)
           @objective(model, Max, x + y)
           @constraint(model, [x, y] in set)
           optimize!(model)
           assert_is_solved_and_feasible(model)
           value(x), value(y), 1 / sqrt(2)
       end
(0.7071067847123265, 0.7071067847123265, 0.7071067811865475)

Example: vector-valued user-defined function and ForwardDiff

Constructing the Jacobian and Hessian is tedious and error prone. Here is an example that uses ForwardDiff.jl:

julia> using JuMP, Ipopt, ForwardDiff

julia> function build_set_from_f(f; input_dimension, output_dimension)
           input_indices = 1:input_dimension
           output_indices = input_dimension.+(1:output_dimension)
           return Ipopt._VectorNonlinearOracle(;
               dimension = input_dimension + output_dimension,
               l = zeros(output_dimension),
               u = zeros(output_dimension),
               # eval_f := f(x) - y
               eval_f = (ret, args) -> begin
                   ret .= f(args[input_indices]) .- args[output_indices]
               end,
               # eval_jacobian := [∇f(x) -I]
               jacobian_structure = vcat(
                   [(i, j) for j in 1:input_dimension for i in 1:output_dimension],
                   [(j, input_dimension + j) for j in 1:output_dimension],
               ),
               eval_jacobian = (ret, args) -> begin
                   ret .= vcat(
                       vec(ForwardDiff.jacobian(f, args[input_indices])),
                       fill(-1.0, output_dimension),
                   )
               end,
               # eval_hessian_lagrangian := ∑ uᵢ ∇²fᵢ(x) := ∇²(u'f(x))
               hessian_lagrangian_structure = [
                   (i, j) for j in 1:input_dimension for i in 1:input_dimension if j >= i
               ],
               eval_hessian_lagrangian = (ret, args, u) -> begin
                   H = ForwardDiff.hessian(x -> u' * f(x), args[input_indices])
                   k = 0
                   for j in 1:input_dimension
                       for i in 1:j
                           k += 1
                           ret[k] = H[i, j]
                       end
                   end
               end,
           )
       end
build_set_from_f (generic function with 1 method)

julia> f(x) = [x[1]^2, x[2]^2 + x[3]^3]
f (generic function with 1 method)

julia> set = build_set_from_f(f; input_dimension = 3, output_dimension = 2);

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x[i in 1:3] == i)
           @variable(model, y[1:2])
           @constraint(model, [x; y] in set)
           optimize!(model)
           assert_is_solved_and_feasible(model)
           f(value.(x)), value.(y)
       end
([1.0, 31.0], [1.0, 31.0])

odow avatar Mar 27 '25 22:03 odow

Excited to test this new feature ASAP!

The eval_jacobian function must have the signature

eval_jacobian(ret::AbstractVector, x::AbstractVector)::Nothing

Did you meant ret::AbstractMatrix ? If yes, there is a similar mistake in the Hessian section.

franckgaga avatar Mar 30 '25 21:03 franckgaga

Nope. ret is a vector, with the non-zeros in the same order as jacobian_structure and hessian_lagrangian_structure.

odow avatar Mar 30 '25 22:03 odow

I have a have basic question: what's the meaning of oracle ? Same thing as a vector field ?

(P.S.: my background is electrical engineering, not operational research.)

franckgaga avatar Apr 02 '25 22:04 franckgaga

Oracle meaning something you can ask a question of and get back an answer. We could have named this "VectorNonlinearCallback" but oracle is a fairly (?) standard term-of-art (if hard to search for because of a certain company). Nothing to do with fields.

(Okay, I've never actually thought of how someone might try to google this. "What is Oracle + Software?" Nope. "Oracle + query?" Nope.)

(Another edit: https://math.stackexchange.com/questions/2613109/what-is-the-purpose-of-an-oracle-in-optimization, https://en.wikipedia.org/wiki/Oracle_complexity_(optimization))

(Another-nother edit: turns out Oracle (the company) can solve NLPs with the "Oracle Decision Optimizer OptQuest" https://docs.oracle.com/cd/E57185_01/CBREG/ch07s02s01s04.html)

odow avatar Apr 02 '25 22:04 odow

Alright thanks! First time I'm hearing it (I'm french also, so It may not help).

I tried to google the term and I failed miserably, hindered by this certain company haha.

franckgaga avatar Apr 02 '25 22:04 franckgaga

Sometimes things go the other way. In the stochastic programming world, they use terms like "hazard-decision" to mean "observe the uncertainty first, then make a decision", but the term is meaningless in English, and thoroughly confused me until I worked out it was a transliteration of the French "hasard"

odow avatar Apr 02 '25 22:04 odow

To add to @odow's explanation, the main advantage of this feature over add_nonlinear_operator is in the case where:

  1. You would like to use a vector-valued external function with second derivatives and
  2. you can compute the Hessian of the Lagrangian (the weighted sum of each output Hessian) more efficiently than the naive approach of evaluating each output Hessian individually. This can often be accomplished using adjoint methods.

Robbybp avatar Apr 13 '25 20:04 Robbybp

Hey @odow, nice feature and syntax!

Is this new set part of the solution to use multithreaded callbacks in Knitro? (I'm thinking about https://github.com/jump-dev/KNITRO.jl/issues/93#issuecomment-1841653317)

guimarqu avatar May 13 '25 13:05 guimarqu

I think no. There are many problems with a multithreaded callback that I don't see us resolving anytime soon.

odow avatar May 13 '25 18:05 odow

So I still want to test the interface, but I was quite busy lately so still not had the time.

My main usage would be for ModelPredictiveControl.jl. It is based on JuMP.jl mainly because of the solver-independence feature. Because of this, my vote would go to option 2. or 3. above.

I have two question on the experimental functionalities:

  1. Why did you choose to store the Hessian as a vector + tuple of indices ? Why not using Julia's standard library SparseArrays ? It seems that it will clash with OffsetArrays.jl (but I agree that it's a low priority feature).
  2. Are you aware of a simple way of converting a Hessian from a SparseMatrixCSC{Tv,Ti} to a vector + tuple of Ints ? I'm asking the question because I use DifferentiationInterfaces.jl, and the sparse differentiation backends return a SparseMatrixCSC{Tv,Ti}.

Also @Robbybp, could you elaborate a bit more on "adjoint methods" ? I tried to find references on this in the context of the Hessian of a Lagrangian but I found nothing. Thanks!

franckgaga avatar Jul 13 '25 15:07 franckgaga

Why did you choose to store the Hessian as a vector + tuple of indices

Because it was simple, and because that's the format that MOI requires.

It seems that it will clash with OffsetArrays.jl

Yes. We explicitly chose Base.Vector so that we don't need to worry about AbstractArray types and their various methods/MethodErrors. We will never support AbstractArrray here.

Are you aware of a simple way of converting a Hessian

julia> s = SparseArrays.sparse([1, 1, 2], [1, 2, 2], rand(3))
2×2 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 0.725471  0.556755
  ⋅        0.601829

julia> I, J, V = SparseArrays.findnz(s)
([1, 1, 2], [1, 2, 2], [0.7254708507838116, 0.5567545415928516, 0.6018292600385067])

julia> indices = collect(zip(I, J))
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (1, 2)
 (2, 2)

julia> V
3-element Vector{Float64}:
 0.7254708507838116
 0.5567545415928516
 0.6018292600385067

because I use DifferentiationInterfaces.jl, and the sparse differentiation backends return

Symmetric? Non-symmetric?

could you elaborate a bit more on "adjoint methods" ?

See

https://github.com/lanl-ansi/MathOptAI.jl/blob/7c6b567bde1ac2923939193332d055cc0ea501ca/ext/MathOptAIIpoptPythonCallExt.jl#L86-L123

odow avatar Jul 13 '25 21:07 odow

Okay, did some tests without exact Hessians, since this what I do right now in MPC.jl (using the memoization trick of JuMP tutorial). It's a fairer comparison.

  • First, good work! I did not noticed any major bug in the implementation.
  • If our objective function is defined with user-defined nonlinear operator, we are still limited to the splatting syntax? I guess the impact on the performance here is way lower than for nonlinear constraints...
  • The way _VectorNonlinearOracle set are printed in the REPL with the temp names of the anonymous functions:
    julia> set
    Ipopt._VectorNonlinearOracle(3, 1, [0.0], [0.0], var"#1#4"(), [(1, 1), (1, 2), (1, 3)], var"#2#5"(), [(1, 1), (2, 2)], var"#3#6"(), [0.0, 0.0, 0.0])
    
    is simple enough, but is problematic when the functions are defined in closures. It is common to see them in stacktraces. In my use case, it prints > 10000 lines! The max default of VS Codes is 10000 lines, so if there is an exception, everything becomes unreadable. I guess you will add pretty-print later?
  • This is preliminary and my code is super dirty right now, but the performance gains compared to old implementation with splatting+memoization is already there! Look for yourself here (the CI is still running right now), or locally on my computer:
main dirty main / dirty
CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Custom constraints/Ipopt/MultipleShooting 1.03 ± 0.0057 s 0.378 ± 0.029 s 2.72 ± 0.21
CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Custom constraints/Ipopt/SingleShooting 0.841 ± 0.01 s 0.792 ± 0.019 s 1.06 ± 0.029
CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Economic/Ipopt/MultipleShooting 0.499 ± 0.0049 s 0.189 ± 0.029 s 2.65 ± 0.41
CASE STUDIES/PredictiveController/Pendulum/NonLinMPC/Noneconomic/Ipopt/MultipleShooting 0.465 ± 0.013 s 0.176 ± 0.026 s 2.64 ± 0.4

It's about 2.5 faster on average. The more the number of nonlinear constraints there is, the higher the speed gain is (see e.g. custom constraints with multiple shooting).

So overall, this set is a huge improvement IMO, not only for efficient Hessian handling, but straight-up for efficient nonlinear constraints, with or without exact Hessians. It should be available for all NLP sovers in the MOI ecosystem.

franckgaga avatar Jul 24 '25 01:07 franckgaga

If our objective function is defined with user-defined nonlinear operator, we are still limited to the splatting syntax?

Yes. You could do an epigraph reformulation though with min t; t >= f(x)

I guess you will add pretty-print later

Yes, we can do that

odow avatar Jul 24 '25 03:07 odow

Yes. You could do an epigraph reformulation though with min t; t >= f(x)

I just learned something! My guessed would have been an equality constraint, but it is really an inequality here, to preserve convexity, from what I read. I'm not sure it would worth it tho. I would loose on code readability, and one single splatted function for the objective is presumably fast enough, compared to multiple functions for the nonlinear constraints.

Would you be interested in benchmarks including exact Hessian (with sparse ForwardDiff based on DI.jl) ?

Else, I will work on other new features in the meantime, before the official public release of the set.

franckgaga avatar Aug 15 '25 15:08 franckgaga

Well. It's the 30th of September. So I guess we need to make a decision. I'm personally in favour of adding it as a set to MOI.

odow avatar Sep 30 '25 01:09 odow

For my research, this is a win with the use of PyTorch surrogates via MathOptAI and other external functions in optimal control problems.

pulsipher avatar Sep 30 '25 03:09 pulsipher

@odow I am working on the interface of MOI for Uno.jl and I added the Jacobian-vector / Hessian-vector operators in the Oracle. If you merge it in MOI.jl, it will be great to also have support for the operators. To the best of my knowledge, we don't need them in MadNLP.jl / Ipopt.jl but it is supported in Uno.jl / BQPD. https://github.com/cvanaret/Uno/pull/310

amontoison avatar Sep 30 '25 09:09 amontoison

Here's my proposal to add it to MOI: https://github.com/jump-dev/MathOptInterface.jl/pull/2860

odow avatar Oct 03 '25 03:10 odow

I think it is a relevant set for the nonlinear solvers!

amontoison avatar Oct 06 '25 04:10 amontoison