Cache subexpressions when building MOI.ScalarNonlinearExpression
Here is a simplified reproducer of the performance issue identified in https://github.com/jump-dev/JuMP.jl/issues/4024:
using JuMP
f(x, u) = [sin(x[1]) - x[1] * u, cos(x[2]) + x[1] * u]
function RK4(f, X, u)
k1 = f(X , u)
k2 = f(X+k1/2, u)
k3 = f(X+k2/2, u)
k4 = f(X+k3 , u)
X + (k1 + 2k2 + 2k3 + k4) / 6
end
model = Model()
@variable(model, q[1:2])
@variable(model, u)
x = q
for m = 1:4
x = RK4(f, x, u)
end
@time moi_function.(x)
Before this PR, this gives
14.610110 seconds (207.45 M allocations: 6.927 GiB, 68.63% gc time)
After this PR, this gives
0.000103 seconds (2.16 k allocations: 74.250 KiB)
Note that the MOI.ScalarNonlinearFunction we generate now share common sub-expression by pointers! If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.
This fixes the slow model-building issue but ReverseAD will still be terribly slow. One thing we could do is to detect, in MOI.Nonlinear, the MOI.ScalarNonlinearFunction that share sub-functions correponding to the same object (with a dictionary). When it detects two functions with the same pointer, it can then create subexpressions and use its existing support for subexpressions.
The nice thing about it is that we're not doing any change to the MOI interface. Creating MOI.ScalarNonlinearFunction was already possible from the beginning, we just didn't treat it any differently. So this change will just be a performance optimization of the AD but the interface is as simple and non-breaking as it gets.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
b19c3e7) to head (ee3be7d).
Additional details and impacted files
@@ Coverage Diff @@
## master #4032 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 43 43
Lines 6149 6160 +11
=========================================
+ Hits 6149 6160 +11
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
If I'm not mistaken, we don't support modifying these in-place so that shouldn't be an issue.
In theory we don't, but the can be. This change needs very very careful consideration.
Yes, it has nontrivial consequences. It may very well jeopardize our chances of adding a modification API for nonlinear constraints in the future.
I think we can simplify this to:
function my_moi_function(f::GenericNonlinearExpr{V}) where {V}
cache = Dict{UInt64,MOI.ScalarNonlinearFunction}()
ret = MOI.ScalarNonlinearFunction(f.head, similar(f.args))
stack = Tuple{MOI.ScalarNonlinearFunction,Int,GenericNonlinearExpr{V}}[]
for i in length(f.args):-1:1
if f.args[i] isa GenericNonlinearExpr{V}
push!(stack, (ret, i, f.args[i]))
else
ret.args[i] = moi_function(f.args[i])
end
end
while !isempty(stack)
parent, i, arg = pop!(stack)
parent.args[i] = get!(cache, objectid(arg)) do
child = MOI.ScalarNonlinearFunction(arg.head, similar(arg.args))
for j in length(arg.args):-1:1
if arg.args[j] isa GenericNonlinearExpr{V}
push!(stack, (child, j, arg.args[j]))
else
child.args[j] = moi_function(arg.args[j])
end
end
return child
end
end
return ret
end
It doesn't hold state across multiple calls to moi_function, but if there are large nested expressions within a single expression (the common case), then we exploit that.
The example gives:
julia> @time moi_function.(x);
0.000142 seconds (1.53 k allocations: 106.688 KiB)
If this was only about moi_function then I would just get the model from the first JuMP.VariableRef I would see and then use the dictionary inside it. The reason I have to change the API is to merge moi_function and check_belongs_to_model in view of that benchmark: https://github.com/jump-dev/MathOptInterface.jl/pull/2788#issuecomment-3102384794
We can fix check_belongs_to_model as well. I don't want to add the model-level cache. The cache should be within the function.
It would be weird to have the subexpressions only work when they are on the same function. Especially since the AD backend supports them being on different ones. I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup. One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible. What's the reason for not having a model-level cache ?
It would be weird to have the subexpressions only work when they are on the same function.
This is an implementation detail. Nothing should be observable at the JuMP level.
Especially since the AD backend supports them being on different ones.
This is also an implementation detail.
I guess we can still find out they are the same later in post-processing, it could be enough just for the sake of speeding up passing the model to the AD backend without the exponential blowup.
This is my strongly preferred option. I would like us to provide the full tape to an AD engine, and then it to turn everything into a single global DAG.
One issue would be that we start creating many small dictionaries if there are a lot of constraints, but that's probably negligible.
Yes, we definitely need benchmarks before merging anything like this
What's the reason for not having a model-level cache ?
It turns every nonlinear expression into a long-lived GC object that will never be freed until the model is.
Second, any expression occurring in multiple constraints seems like much less of an issue than the original example, where nested expressions are programmatically created.
Third, if we start with the internal cache, we can always change to a model-level cache later if needed.
That makes sense, if we only take care of not duplicating aliased subexpression at the level of each function, it's indeed easier. We then detect subexpressions in the AD backend using hashes so the aliases used by the user have no impact on the subexpressions used by the AD. It means that the user has no control over subexpressions but on the other hand, relying on the sub-expression used by the user was probably not the ideal way to let the user control it as well.