KNITRO.jl
KNITRO.jl copied to clipboard
`optimize!` has large overhead
I'm solving multi-period ACOPF problems with Knitro and Ipopt, and optimize!
with Knitro seems to have a large overhead compared to what the solver reports.
This can be observed with the following script:
import GOC3Benchmark as goc
import JuMP
import KNITRO
import Ipopt
problem_file = "./scenario_002.json"
input_data = goc.get_data_from_file(problem_file)
@time model = goc.get_multiperiod_acopf_model(input_data)
JuMP.set_optimizer(model, KNITRO.Optimizer)
@time JuMP.optimize!(model)
JuMP.set_optimizer(model, Ipopt.Optimizer)
@time JuMP.optimize!(model)
Here, ./scenario_002.json
is the C3E4N00617_20231002/D2/C3E4N00617D2/scenario_002.json
file in C3E4N00617_20231002.zip
that can be downloaded from this webpage. GOC3Benchmark
is the open-source version of the Grid Optimization Competition Challenge 3 benchmark algorithm.
I get the following results:
Solver | optimize! time (s) |
Time reported by solver log (s) |
---|---|---|
Knitro | 140 | 58 |
Ipopt | 1228 | 1222 |
Knitro.jl seems to be spending more time than I expect in data structure initialization. The discrepancy between Knitro's "actual" and "reported" times grows as I try larger problems from the GOC E4 datasets, and is not present when I solve the same problems with the knitroampl
executable. If this is due to something intended, or unavoidable for some reason, feel free to close the issue.
Versions
Julia 1.10.0 JuMP 1.20.0 Ipopt 1.6.2 KNITRO 0.14.1 Platform: M1 Mac
It's plausible that there are some performance improvements that we could make. I haven't benchmarked the package closely. Ipopt is very "simple" so it also doesn't surprise me that KNITRO has more overhead.
Thank you for including a MWE. I am able to reproduce your issue on my laptop. Investigating it further, it looks like the bottleneck is in MOI.copy_to
. If I do
import JuMP: MOI
model = goc.get_multiperiod_acopf_model(input_data)
optimizer = KNITRO.Optimizer()
MOI.copy_to(optimizer, model)
I observe we spend ~70s in the copy_to
operation. I tried to generate a detailed performance profile, but without success.
I think there is a key difference compared to Ipopt, as Knitro is implementing an incremental interface. Meaning that instead of passing the model all in once to the optimizer, we build it incrementally. Each time we add new variables / new constraints we have to reallocate some memory inside the solver, and that can prove to be expensive if we have to build a large model (as it is the case here).
A workaround would be to pass the structure in a vectorized fashion, by passing the constraints and the variables all in once to the solver (instead of one by one). This might be slightly related to: https://github.com/jump-dev/JuMP.jl/pull/3716
I only have a limited size license so I can't test this, but can someone post the result of:
import GOC3Benchmark as goc
import JuMP
import KNITRO
using ProfileView
problem_file = "./scenario_002.json"
input_data = goc.get_data_from_file(problem_file);
begin # precompile
model = goc.get_multiperiod_acopf_model(input_data)
JuMP.set_optimizer(model, KNITRO.Optimizer)
@profview JuMP.MOIU.attach_optimizer(model)
end
begin # actual run
model = goc.get_multiperiod_acopf_model(input_data)
JuMP.set_optimizer(model, KNITRO.Optimizer)
@profview JuMP.MOIU.attach_optimizer(model)
end
I also see that:
julia> model = goc.get_multiperiod_acopf_model(input_data)
A JuMP Model
Maximization problem with:
Variables: 529056
Objective function type: JuMP.AffExpr
`JuMP.NonlinearExpr`-in-`MathOptInterface.EqualTo{Float64}`: 164832 constraints
`JuMP.AffExpr`-in-`MathOptInterface.EqualTo{Float64}`: 24000 constraints
`JuMP.AffExpr`-in-`MathOptInterface.GreaterThan{Float64}`: 47904 constraints
`JuMP.AffExpr`-in-`MathOptInterface.LessThan{Float64}`: 47904 constraints
`JuMP.AffExpr`-in-`MathOptInterface.Interval{Float64}`: 64896 constraints
`JuMP.QuadExpr`-in-`MathOptInterface.EqualTo{Float64}`: 58176 constraints
`JuMP.QuadExpr`-in-`MathOptInterface.LessThan{Float64}`: 81888 constraints
`JuMP.VariableRef`-in-`MathOptInterface.GreaterThan{Float64}`: 499440 constraints
`JuMP.VariableRef`-in-`MathOptInterface.LessThan{Float64}`: 380976 constraints
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
Names registered in the model: p_balance, p_balance_slack_neg, p_balance_slack_pos, p_branch, p_sdd, pq_eq, pq_lb, pq_ub, q_balance, q_balance_slack_neg, q_balance_slack_pos, q_branch, q_implication_max, q_implication_min, q_sdd, ramp_lb, ramp_ub, shunt_step, va, vm
it doesn't seem unreasonable that KNITRO might take a while to build this problem in incremental mode.
Hard to know what the problem is without a profile.
Here's the result with @profview
:
And here's the result with @pprof
:
Looks to me like the culprit is _canonical_quadratic_reduction
, but I'm not really sure what to make of that. Let me know if you want more information from the profile.
Ooof. Yeah. We can improve this:
https://github.com/jump-dev/KNITRO.jl/blob/38d473f9d46a05db90eb9145765345fb968849cf/src/MOI_wrapper.jl#L20-L35
It's costly, especially for small sizes.
@Robbybp how did you build the quadratic equality constraints? They don't seem to have any quadratic terms?
Fixed with #296, thanks! The time for optimize!
is now only 7 seconds longer than what is reported by the solver, which seems reasonable.
how did you build the quadratic equality constraints? They don't seem to have any quadratic terms?
My best guess is that these are power balance equations on buses that have no shunts, e.g.:
@constraint(model,
p_balance[uid in bus_ids],
sum(p_branch[k] for k in bus_branch_keys[uid], init = 0) ==
sum(p_sdd[ssd_id] for ssd_id in bus_sdd_producer_ids[uid], init = 0) -
sum(p_sdd[ssd_id] for ssd_id in bus_sdd_consumer_ids[uid], init = 0) -
sum(
shunt_lookup[shunt_id]["gs"]*shunt_step[shunt_id]
for shunt_id in bus_shunt_ids[uid],
init = 0
)*vm[uid]^2
#gs*vm[uid]^2
)
(from https://github.com/lanl-ansi/GOC3Benchmark.jl/blob/a5990590e4ea58488651dcb7fca745bcea34bbea/src/opf_model.jl#L559)
Great! I wasn't able to reproduce such an extreme discrepancy in my local testing, but I guess it was causing a GC issue or something.
If you notice any performance issues like this, they're often a simple fix away once you profile.