SpineOpt.jl
SpineOpt.jl copied to clipboard
Model generation speed
SpineOpt is slow in generating large models as has been reported by many users. This is especially problematic for non-rolling models where the model generation is larger part of the total execution time.
These performance issues may be linked to the widespread use of dicts instead of array type parameters. @jd-lara gave us good pointers recently and we'll be pursuing those. There is reason to believe that the situation can be considerably improved.
@Tasqu is doing profiling at the moment and we'll discuss the steps forward in the Mopo project meeting 22-26th Jan. in Leuven. This issue will then hopefully receive sub-issues detailing what to be done next.
added after Leuven:
- [ ] Improve type-stability
- [ ] Precompute some of the index sets and store them in memory rather than computing them on the fly and then discarding them
- [ ] Store data in DataFrames so we can use their optimised join
- [x] Precompute and save highest and lowest resolution time-slice relationships
maybe also:
- #606
Just to capture here that in conversations with @manuelma he thought that the problems may be in "the joining and unique calls, that require intermediates allocations"
These performance issues may be linked to the widespread use of dicts instead of array type parameters
Emphasis on may be - we do not know yet. @Tasqu maybe you can also profile allocations, see here.
My feeling, but of course also to verify, is we do too many allocations. Sometimes we see several GB of allocations for generating a constraint.
Also to add here that if we could switch contraints off using a higher level switch like a method parameter - that would also improve model generation speed. Users have resorted to commenting constraints out because the way its currently written, it takes a lot of looping through indexes right now for SpineOpt to know that the constraint indices are a null set. In many cases, we should know that if a parameter has no values, then we don't need to generate certain constraints at all. This is a very low hanging fruit. I will create an issue
Users have resorted to commenting constraints out because the way its currently written, it takes a lot of looping through indexes right now for SpineOpt to know that the constraint indices are a null set
@DillonJ it would be nice to see/reproduce this experiment, do you have more information?
@manuelma @DillonJ @jkiviluo If you want, I can spoil the surprise for next week 😄 Or you can spoil yourselves, as I've already uploaded my presentation with the initial results of the profiling I did to the Mopo sharepoint.
However, I'd like to point out that there's a ton of information in the full profiling for run_spineopt
, and since I'm still new to doing this in Julia, I can't promise to have identified all the main bottlenecks. Furthermore, I've only tested things in 0.8-dev
branches, with a model I had lying around that doesn't even come close to touching all of SpineOpt functionality, so there can be things it misses.
I was thinking of testing one potentially low-hanging speedup-fruit today, just to see if I understand the profiling correctly.
@manuelma
@DillonJ it would be nice to see/reproduce this experiment, do you have more information?
@OliverLinsel describes what they did here in detail: https://github.com/spine-tools/SpineOpt.jl/issues/825#issuecomment-1888845713
This goes back to the general point - if a user doesn't want to implement complex reserves, the model shouldn't generate all the associated variables and constraints... and if it consumes a lot of performance just to figure out that a user is not using these functionalities, then maybe we need a switch to tell the model at a high level - a method parameter for example
@Tasqu I mentioned in the Mopo meeting that our JOINs might be too slow compared to optimised JOINs from say the DataFrames.jl package. Here is my benchmark:
using DataFrames
using SpineInterface
# Create two SpineOpt-like `RelationshipClass`es
arr1 = [(unit=i, node=i, direction=((i - 1) % 2 + 1), temporal_block=((i - 1) % 5) + 1) for i in 1:100]
arr2 = [(node=i, stochastic_structure=1) for i in 1:100]
rc1 = RelationshipClass(:rc1, [:unit, :node, :direction, :temporal_block], arr1, Dict(), Dict())
rc2 = RelationshipClass(:rc2, [:node, :stochastic_structure], arr2, Dict(), Dict())
# Create a function to join the two classes on the `node` dimension
# This looks like our typical indices function
function joinrc(rc1, rc2)
[
(unit=u, node=n, direction=d, temporal_block=tb, stochastic_structure=ss)
for (u, n, d, tb) in rc1()
for ss in rc2(node=n)
]
end
# Create two DataFrames with exactly the same data as the above classes
df1 = DataFrame(unit=1:100, node=1:100, direction=repeat(1:2, 50), temporal_block=repeat(1:5, 20))
df2 = DataFrame(node=1:100, stochastic_structure=1)
# Compare joins
@time joinrc(rc1, rc2);
@time innerjoin(df1, df2; on=:node);
And I get this (julia 1.8):
julia> @time joinrc(rc1, rc2);
0.000698 seconds (7.32 k allocations: 485.391 KiB)
julia> @time innerjoin(df1, df2; on=:node);
0.000090 seconds (161 allocations: 15.797 KiB)
So the DataFrames version needs 30 times fewer allocations and is almost 8 times faster.
I posted this to gitter,
It feels like the idea of storing indices makes sense... as Topi suggests, we are repeatedly recomputing and storing the same indices... each time a variable is used, for example. It seems like we could be storing multiple copies of indices anyway, so maybe the memory usage would not be worse? Could we not simply store the indices along with the variable in m.ext[variables] (or wherever it is)?
@DillonJ you may be right but we can't possibly store all the indices (can we? maybe I'm wrong). We will be using too much memory it seems. Sometimes computing and discarding stuff allows you to reuse that memory and actually be able to fit your model in memory. If we store the same subset of indices in many places we might fill the memory. We need to find the right balance.
we can't possibly store all the indices
Not all indices or subsets of indices, but if we stored variable indices instead of the index function, perhaps that would be possble?
Some conclusions of the Mopo hackaton session
- Improve type-stability
- Precompute some of the index sets and store them in memory rather than computing them on the fly and then discarding them
- Store data in DataFrames so we can use their optimised join
- Precompute and save highest and lowest resolution time-slice relationships
More selective of variable history creation should also help https://github.com/spine-tools/SpineOpt.jl/issues/606
SpineOpt is slow in generating large models as has been reported by many users
@jkiviluo can we get in touch with those users to maybe get more details? I think we need a frame of reference here. Slow compared to what? It would be valuable to know what kind of models are they trying to build, what is their alternative tool and how much slower SpineOpt is compared to that. Otherwise it would be difficult to set up goals for this work.
A question... I'm looking at JuMP performance tips here: https://jump.dev/JuMP.jl/stable/tutorials/getting_started/performance_tips/
The second tip is to disable bridges if none are used... are we using bridges? If so, do we need them? Can we disable them?
Using direct mode is mentioned as another way to improve performance. Can we use this?
Finally, using string names has an overhead... we could disable these as an option... but making sure they can be turned back on when trying to diagnose issues. (this might not work since you can then no longer look up variables by name)
@manuelma @abelsiqueira @datejada
@DillonJ Both model = direct_model(...)
and set_string_names_on_creation(model, false)
have been used to improve performance of the so-called IJKLM model, see https://jump.dev/2023/07/20/gams-blog/.
Thanks @awgutierrez - looks like direct_model()
halved the model build time. We already have a run_spineopt()
keyword argument to use this - so we should possibly recommend this to users.
@manuelma do you know if direct_model()
constrains any features?
@manuelma @abelsiqueira Just a thought, but perhaps a relatively quick thing we could try to improve the performance of the indices (e.g. #925) is to replace Array
s with Generators
and Iterators
? Large Array
s in Julia seem to take up a lot more time and memory to allocate, and we do this over and over with the indexing functions. See a quick test below:
# Generate a bunch of random numbers
julia> @time r = rand(Int(1e8));
0.134914 seconds (2 allocations: 762.939 MiB)
# Array including all elements of r less than 0.5
julia> @time vec = [i for i in r if i < 0.5];
0.858676 seconds (28.54 k allocations: 491.654 MiB, 3.94% compilation time)
# Iterator over all elements of r less than 0.5, around ~100x faster than array.
julia> @time iter = (i for i in r if i < 0.5);
0.007780 seconds (839 allocations: 59.555 KiB, 98.37% compilation time)
I'm not sure if our indices need to be Array
s for some of the logic we perform on them, or if we could replace them with Iterators
. Any thoughts?
Defining the generator takes no time, actually iterating it takes all the time. We can't just define generators and leave them alone, we need to iterate them to actually collect the indices.
@manuelma True, I suppose. However, at the moment, each indexing function call returns an Array
, and I think there might be a lot of time and memory wasted allocating memory for all of the "intermediary indexing function calls". If we did all the intermediary indexing logic with Generators
, we might save something?
Sounds like the idea could be benchmarked relatively easily for an existing constraint?
@Tasqu how do we do the unique
part with Generator
s?
@manuelma That's why it's just a thought 😅 I'm not very familiar with Generator
s and Iterator
s, so I don't know if they support all the logic we'd need them to support. In any case, a lot of the unique
calls in the indexing functions are technically superfluous, only there to make absolutely sure we don't get accidental duplicate indices.
Right, haha sorry for that. Yeah, I think it's worth exploring. I think you're probably right on the money that the intermediate allocations is what's killing us the most.
Indeed, traversing the iterator is what takes time in this case. On #925, we are looking at constraint_connection_flow_capacity_indices
. It returns a generator, and it looks like it is traversed as a iterator. Just running the function is fast, but collecting it is slow.
In fact, that is one of the reasons to pre-allocate these indices, i.e. return Arrays but store them somewhere. Maybe using Memoize.jl this could be easy to implement. (Edit: as a paliative measure)
The test with the generators have priority but if the indices do need to be available in memory as arrays, would it make sense to check how many times the exact same set of indices is generated?
With that information we could perhaps group similar constraints and call the indices function only once per group (operation-scale, investment-scale, ...)?