SpineOpt.jl icon indicating copy to clipboard operation
SpineOpt.jl copied to clipboard

Model generation speed

Open jkiviluo opened this issue 1 year ago • 25 comments

SpineOpt is slow in generating large models as has been reported by many users. This is especially problematic for non-rolling models where the model generation is larger part of the total execution time.

These performance issues may be linked to the widespread use of dicts instead of array type parameters. @jd-lara gave us good pointers recently and we'll be pursuing those. There is reason to believe that the situation can be considerably improved.

@Tasqu is doing profiling at the moment and we'll discuss the steps forward in the Mopo project meeting 22-26th Jan. in Leuven. This issue will then hopefully receive sub-issues detailing what to be done next.

added after Leuven:

  • [ ] Improve type-stability
  • [ ] Precompute some of the index sets and store them in memory rather than computing them on the fly and then discarding them
  • [ ] Store data in DataFrames so we can use their optimised join
  • [x] Precompute and save highest and lowest resolution time-slice relationships

maybe also:

  • #606

jkiviluo avatar Jan 18 '24 09:01 jkiviluo

Just to capture here that in conversations with @manuelma he thought that the problems may be in "the joining and unique calls, that require intermediates allocations"

DillonJ avatar Jan 18 '24 09:01 DillonJ

These performance issues may be linked to the widespread use of dicts instead of array type parameters

Emphasis on may be - we do not know yet. @Tasqu maybe you can also profile allocations, see here.

My feeling, but of course also to verify, is we do too many allocations. Sometimes we see several GB of allocations for generating a constraint.

manuelma avatar Jan 18 '24 16:01 manuelma

Also to add here that if we could switch contraints off using a higher level switch like a method parameter - that would also improve model generation speed. Users have resorted to commenting constraints out because the way its currently written, it takes a lot of looping through indexes right now for SpineOpt to know that the constraint indices are a null set. In many cases, we should know that if a parameter has no values, then we don't need to generate certain constraints at all. This is a very low hanging fruit. I will create an issue

DillonJ avatar Jan 18 '24 17:01 DillonJ

Users have resorted to commenting constraints out because the way its currently written, it takes a lot of looping through indexes right now for SpineOpt to know that the constraint indices are a null set

@DillonJ it would be nice to see/reproduce this experiment, do you have more information?

manuelma avatar Jan 18 '24 18:01 manuelma

@manuelma @DillonJ @jkiviluo If you want, I can spoil the surprise for next week 😄 Or you can spoil yourselves, as I've already uploaded my presentation with the initial results of the profiling I did to the Mopo sharepoint.

However, I'd like to point out that there's a ton of information in the full profiling for run_spineopt, and since I'm still new to doing this in Julia, I can't promise to have identified all the main bottlenecks. Furthermore, I've only tested things in 0.8-dev branches, with a model I had lying around that doesn't even come close to touching all of SpineOpt functionality, so there can be things it misses.

I was thinking of testing one potentially low-hanging speedup-fruit today, just to see if I understand the profiling correctly.

Tasqu avatar Jan 19 '24 07:01 Tasqu

@manuelma

@DillonJ it would be nice to see/reproduce this experiment, do you have more information?

@OliverLinsel describes what they did here in detail: https://github.com/spine-tools/SpineOpt.jl/issues/825#issuecomment-1888845713

This goes back to the general point - if a user doesn't want to implement complex reserves, the model shouldn't generate all the associated variables and constraints... and if it consumes a lot of performance just to figure out that a user is not using these functionalities, then maybe we need a switch to tell the model at a high level - a method parameter for example

DillonJ avatar Jan 19 '24 08:01 DillonJ

@Tasqu I mentioned in the Mopo meeting that our JOINs might be too slow compared to optimised JOINs from say the DataFrames.jl package. Here is my benchmark:

using DataFrames
using SpineInterface

# Create two SpineOpt-like `RelationshipClass`es
arr1 = [(unit=i, node=i, direction=((i - 1) % 2 + 1), temporal_block=((i - 1) % 5) + 1) for i in 1:100]
arr2 = [(node=i, stochastic_structure=1) for i in 1:100]
rc1 = RelationshipClass(:rc1, [:unit, :node, :direction, :temporal_block], arr1, Dict(), Dict())
rc2 = RelationshipClass(:rc2, [:node, :stochastic_structure], arr2, Dict(), Dict())

# Create a function to join the two classes on the `node` dimension
# This looks like our typical indices function
function joinrc(rc1, rc2)
	[
		(unit=u, node=n, direction=d, temporal_block=tb, stochastic_structure=ss)
		for (u, n, d, tb) in rc1()
		for ss in rc2(node=n)
	]
end

# Create two DataFrames with exactly the same data as the above classes
df1 = DataFrame(unit=1:100, node=1:100, direction=repeat(1:2, 50), temporal_block=repeat(1:5, 20))
df2 = DataFrame(node=1:100, stochastic_structure=1)

# Compare joins
@time joinrc(rc1, rc2);
@time innerjoin(df1, df2; on=:node);

And I get this (julia 1.8):

julia> @time joinrc(rc1, rc2);
  0.000698 seconds (7.32 k allocations: 485.391 KiB)

julia> @time innerjoin(df1, df2; on=:node);
  0.000090 seconds (161 allocations: 15.797 KiB)

So the DataFrames version needs 30 times fewer allocations and is almost 8 times faster.

manuelma avatar Jan 22 '24 15:01 manuelma

I posted this to gitter,

It feels like the idea of storing indices makes sense... as Topi suggests, we are repeatedly recomputing and storing the same indices... each time a variable is used, for example. It seems like we could be storing multiple copies of indices anyway, so maybe the memory usage would not be worse? Could we not simply store the indices along with the variable in m.ext[variables] (or wherever it is)?

DillonJ avatar Jan 22 '24 15:01 DillonJ

@DillonJ you may be right but we can't possibly store all the indices (can we? maybe I'm wrong). We will be using too much memory it seems. Sometimes computing and discarding stuff allows you to reuse that memory and actually be able to fit your model in memory. If we store the same subset of indices in many places we might fill the memory. We need to find the right balance.

manuelma avatar Jan 22 '24 15:01 manuelma

we can't possibly store all the indices

Not all indices or subsets of indices, but if we stored variable indices instead of the index function, perhaps that would be possble?

DillonJ avatar Jan 22 '24 16:01 DillonJ

Some conclusions of the Mopo hackaton session

  • Improve type-stability
  • Precompute some of the index sets and store them in memory rather than computing them on the fly and then discarding them
  • Store data in DataFrames so we can use their optimised join
  • Precompute and save highest and lowest resolution time-slice relationships

manuelma avatar Jan 24 '24 10:01 manuelma

More selective of variable history creation should also help https://github.com/spine-tools/SpineOpt.jl/issues/606

DillonJ avatar Jan 24 '24 10:01 DillonJ

SpineOpt is slow in generating large models as has been reported by many users

@jkiviluo can we get in touch with those users to maybe get more details? I think we need a frame of reference here. Slow compared to what? It would be valuable to know what kind of models are they trying to build, what is their alternative tool and how much slower SpineOpt is compared to that. Otherwise it would be difficult to set up goals for this work.

manuelma avatar Jan 30 '24 11:01 manuelma

A question... I'm looking at JuMP performance tips here: https://jump.dev/JuMP.jl/stable/tutorials/getting_started/performance_tips/

The second tip is to disable bridges if none are used... are we using bridges? If so, do we need them? Can we disable them?

Using direct mode is mentioned as another way to improve performance. Can we use this?

Finally, using string names has an overhead... we could disable these as an option... but making sure they can be turned back on when trying to diagnose issues. (this might not work since you can then no longer look up variables by name)

@manuelma @abelsiqueira @datejada

DillonJ avatar Feb 26 '24 11:02 DillonJ

@DillonJ Both model = direct_model(...) and set_string_names_on_creation(model, false) have been used to improve performance of the so-called IJKLM model, see https://jump.dev/2023/07/20/gams-blog/.

awgutierrez avatar Feb 27 '24 14:02 awgutierrez

Thanks @awgutierrez - looks like direct_model() halved the model build time. We already have a run_spineopt() keyword argument to use this - so we should possibly recommend this to users.

@manuelma do you know if direct_model() constrains any features?

DillonJ avatar Feb 27 '24 14:02 DillonJ

@manuelma @abelsiqueira Just a thought, but perhaps a relatively quick thing we could try to improve the performance of the indices (e.g. #925) is to replace Arrays with Generators and Iterators? Large Arrays in Julia seem to take up a lot more time and memory to allocate, and we do this over and over with the indexing functions. See a quick test below:

# Generate a bunch of random numbers
julia> @time r = rand(Int(1e8));
  0.134914 seconds (2 allocations: 762.939 MiB)

# Array including all elements of r less than 0.5
julia> @time vec = [i for i in r if i < 0.5];
  0.858676 seconds (28.54 k allocations: 491.654 MiB, 3.94% compilation time)

# Iterator over all elements of r less than 0.5, around ~100x faster than array.
julia> @time iter = (i for i in r if i < 0.5);
  0.007780 seconds (839 allocations: 59.555 KiB, 98.37% compilation time)

I'm not sure if our indices need to be Arrays for some of the logic we perform on them, or if we could replace them with Iterators. Any thoughts?

Tasqu avatar Mar 14 '24 09:03 Tasqu

Defining the generator takes no time, actually iterating it takes all the time. We can't just define generators and leave them alone, we need to iterate them to actually collect the indices.

manuelma avatar Mar 14 '24 09:03 manuelma

@manuelma True, I suppose. However, at the moment, each indexing function call returns an Array, and I think there might be a lot of time and memory wasted allocating memory for all of the "intermediary indexing function calls". If we did all the intermediary indexing logic with Generators, we might save something?

Tasqu avatar Mar 14 '24 10:03 Tasqu

Sounds like the idea could be benchmarked relatively easily for an existing constraint?

DillonJ avatar Mar 14 '24 10:03 DillonJ

@Tasqu how do we do the unique part with Generators?

manuelma avatar Mar 14 '24 10:03 manuelma

@manuelma That's why it's just a thought 😅 I'm not very familiar with Generators and Iterators, so I don't know if they support all the logic we'd need them to support. In any case, a lot of the unique calls in the indexing functions are technically superfluous, only there to make absolutely sure we don't get accidental duplicate indices.

Tasqu avatar Mar 14 '24 10:03 Tasqu

Right, haha sorry for that. Yeah, I think it's worth exploring. I think you're probably right on the money that the intermediate allocations is what's killing us the most.

manuelma avatar Mar 14 '24 10:03 manuelma

Indeed, traversing the iterator is what takes time in this case. On #925, we are looking at constraint_connection_flow_capacity_indices. It returns a generator, and it looks like it is traversed as a iterator. Just running the function is fast, but collecting it is slow.

In fact, that is one of the reasons to pre-allocate these indices, i.e. return Arrays but store them somewhere. Maybe using Memoize.jl this could be easy to implement. (Edit: as a paliative measure)

abelsiqueira avatar Mar 14 '24 11:03 abelsiqueira

The test with the generators have priority but if the indices do need to be available in memory as arrays, would it make sense to check how many times the exact same set of indices is generated?

With that information we could perhaps group similar constraints and call the indices function only once per group (operation-scale, investment-scale, ...)?

tarskul avatar Mar 15 '24 12:03 tarskul