JuMP.jl Cache subexpressions when building MOI.ScalarNonlinearExpression

Here is a simplified reproducer of the performance issue identified in https://github.com/jump-dev/JuMP.jl/issues/4024:

using JuMP

f(x, u) = [sin(x[1]) - x[1] * u, cos(x[2]) + x[1] * u]
function RK4(f, X, u)
    k1 = f(X     , u)
    k2 = f(X+k1/2, u)
    k3 = f(X+k2/2, u)
    k4 = f(X+k3  , u)
    X + (k1 + 2k2 + 2k3 + k4) / 6
end

model = Model()

@variable(model, q[1:2])
@variable(model, u)

x = q
for m = 1:4
    x = RK4(f, x, u)
end
@time moi_function.(x)

Before this PR, this gives

14.610110 seconds (207.45 M allocations: 6.927 GiB, 68.63% gc time)

After this PR, this gives

0.000103 seconds (2.16 k allocations: 74.250 KiB)

Note that the MOI.ScalarNonlinearFunction we generate now share common sub-expression by pointers! If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.

This fixes the slow model-building issue but ReverseAD will still be terribly slow. One thing we could do is to detect, in MOI.Nonlinear, the MOI.ScalarNonlinearFunction that share sub-functions correponding to the same object (with a dictionary). When it detects two functions with the same pointer, it can then create subexpressions and use its existing support for subexpressions. The nice thing about it is that we're not doing any change to the MOI interface. Creating MOI.ScalarNonlinearFunction was already possible from the beginning, we just didn't treat it any differently. So this change will just be a performance optimization of the AD but the interface is as simple and non-breaking as it gets.

Jul 22 '25 07:07 blegat

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (b19c3e7) to head (ee3be7d).

Additional details and impacted files

@@            Coverage Diff            @@
##            master     #4032   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           43        43           
  Lines         6149      6160   +11     
=========================================
+ Hits          6149      6160   +11

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jul 22 '25 08:07 codecov[bot]

If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.

In theory we don't, but the can be. This change needs very very careful consideration.

Jul 22 '25 21:07 odow

Yes, it has nontrivial consequences. It may very well jeopardize our chances of adding a modification API for nonlinear constraints in the future.

Jul 22 '25 22:07 blegat

I think we can simplify this to:

function my_moi_function(f::GenericNonlinearExpr{V}) where {V}
    cache = Dict{UInt64,MOI.ScalarNonlinearFunction}()
    ret = MOI.ScalarNonlinearFunction(f.head, similar(f.args))
    stack = Tuple{MOI.ScalarNonlinearFunction,Int,GenericNonlinearExpr{V}}[]
    for i in length(f.args):-1:1
        if f.args[i] isa GenericNonlinearExpr{V}
            push!(stack, (ret, i, f.args[i]))
        else
            ret.args[i] = moi_function(f.args[i])
        end
    end
    while !isempty(stack)
        parent, i, arg = pop!(stack)
        parent.args[i] = get!(cache, objectid(arg)) do
            child = MOI.ScalarNonlinearFunction(arg.head, similar(arg.args))
            for j in length(arg.args):-1:1
                if arg.args[j] isa GenericNonlinearExpr{V}
                    push!(stack, (child, j, arg.args[j]))
                else
                    child.args[j] = moi_function(arg.args[j])
                end
            end
            return child
        end
    end
    return ret
end

It doesn't hold state across multiple calls to moi_function, but if there are large nested expressions within a single expression (the common case), then we exploit that.

Aug 03 '25 22:08 odow

The example gives:

julia> @time moi_function.(x);
  0.000142 seconds (1.53 k allocations: 106.688 KiB)

Aug 03 '25 22:08 odow

If this was only about moi_function then I would just get the model from the first JuMP.VariableRef I would see and then use the dictionary inside it. The reason I have to change the API is to merge moi_function and check_belongs_to_model in view of that benchmark: https://github.com/jump-dev/MathOptInterface.jl/pull/2788#issuecomment-3102384794

Aug 05 '25 11:08 blegat

We can fix check_belongs_to_model as well. I don't want to add the model-level cache. The cache should be within the function.

Aug 05 '25 22:08 odow

It would be weird to have the subexpressions only work when they are on the same function. Especially since the AD backend supports them being on different ones. I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup. One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible. What's the reason for not having a model-level cache ?

Aug 06 '25 06:08 blegat

It would be weird to have the subexpressions only work when they are on the same function.

This is an implementation detail. Nothing should be observable at the JuMP level.

Especially since the AD backend supports them being on different ones.

This is also an implementation detail.

I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup.

This is my strongly preferred option. I would like us to provide the full tape to an AD engine, and then it to turn everything into a single global DAG.

One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible.

Yes, we definitely need benchmarks before merging anything like this

What's the reason for not having a model-level cache ?

It turns every nonlinear expression into a long-lived GC object that will never be freed until the model is.

Second, any expression occurring in multiple constraints seems like much less of an issue than the original example, where nested expressions are programmatically created.

Third, if we start with the internal cache, we can always change to a model-level cache later if needed.

Aug 06 '25 22:08 odow

That makes sense, if we only take care of not duplicating aliased subexpression at the level of each function, it's indeed easier. We then detect subexpressions in the AD backend using hashes so the aliases used by the user have no impact on the subexpressions used by the AD. It means that the user has no control over subexpressions but on the other hand, relying on the sub-expression used by the user was probably not the ideal way to let the user control it as well.

Aug 07 '25 15:08 blegat