pypsa-eur icon indicating copy to clipboard operation
pypsa-eur copied to clipboard

Build year aggregation

Open koen-vg opened this issue 1 year ago • 22 comments

As mentioned in https://github.com/PyPSA/linopy/issues/248, myopic foresight optimisation comes with a significant progressive memory overhead. The problem seems to be two-fold:

  1. The increasing number of components with different build years leads to increasing memory (and time) requirements in myopic foresight optimisations.
  2. Moreover, in at least some (realistic) cases, there are memory spikes during the model construction in myopic foresight settings.

See https://github.com/PyPSA/linopy/issues/248 for a graph illustrating both effects.

This PR is an attempt at leading with the first of these two issues; in the process it does also solve the second issue in my tests (but cannot guarantee that that issue is resolved for any configuration).

Changes proposed in this Pull Request

The idea is that myopic foresight optimisation involves adding many near-copies of components, one for each planning horizon, with only one (the one corresponding to the "current" planning horizon) being extendable for each optimisation. For example, you have NO0 0 onwind-2020, NO0 0 onwind-2025, etc.

Now, these components are essentially the same apart from nominal capacity: they have the same bus connections, efficiencies, capacity factors, etc. This means that they could be treated as a single component. That is what this PR does.

The PR introduces two new functions in solve_network.py: an aggregation function and a disaggregation function. The aggregation function would aggregate e.g. the two components NO0 0 onwind-2020, NO0 0 onwind-2025 to a single component NO0 0 onwind. The network is then solved. After solving the disaggregation function reconstructs the two original components NO0 0 onwind-2020, NO0 0 onwind-2025. Ideally, these two functions should be inverses in that first aggregating and then disaggregating results in the original network.

Of course, some information might be lost in aggregation. Importantly, one needs to keep track of the nominal capacity (and lifetime) in each build year so that these capacities can be phased out at the end of their lifetimes correctly.

To deal with this, the aggregation function adds additional columns p_nom-2020, p_nom-2025, etc. to n.df(c) (for each class of components c); the contents of these columns can then be used to reconstruct the original components in the disaggregation function.

Exactly which attributes are stored between aggregation and disaggregation is flexible; currently this stands at p_nom, p_nom_max, lifetime and capital_cost.

Results

I have tested the implementation with the following configuration:

run:
  name: "mem-test"
  disable_progressbar: true
  shared_resources: false
  shared_cutouts: true

enable:
  retrieve_cutout: false

foresight: myopic

scenario:
  ll:
  - v1.5
  clusters:
  - 50
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2025
  - 2030

countries: ["DE", "NL", "BE"]

clustering:
  temporal:
    resolution_sector: "24H"
  build_year_aggregation: false

sector:
  central_heat_vent: true
  transmission_efficiency_enabled: false

electricity:
  extendable_carriers:
    Generator: [OCGT, solar, onwind, offwind-ac, offwind-dc]
    StorageUnit: [battery]
    Store: [H2]
    Link: [H2 pipeline]

  renewable_carriers: [solar, onwind, offwind-ac, offwind-dc]

  estimate_renewable_capacities:
    enable: false

renewable:
  solar:
    cutout: "europe-2013-era5"

solving:
  solver:
    name: "gurobi"

Note that I added an option and sector_opts wildcard flag to enable or disable the build year aggregation.

The networks with and without build year aggregation have nearly identical characteristics, which I take as very strong evidence that the aggregation and disaggregation implementations are correct; see e.g. the energy balances: image

As for what the whole point of this was, the build year aggregation leads to a drastic decrease in memory footprint. Here is the memory usage over time for the first and last planning horizons with and without build year aggregation: image

That's an improvement of more than a factor of 5! (from more than 20GB to about 4GB). (The graph includes the aggregation and disaggregation steps.)

Interestingly the solving time doesn't change so much. Evidently Gurobi can solve the non-aggregated problem just as quickly; it only needs a lot more memory.

The aggregation and disaggregation does take some time, leading to a slight overhead. At the resolution of the above test (50 nodes, 24H), this just about breaks even. At lower resolutions, the overhead of the (dis)aggregation means that the total amount of time taken when aggregating build-years can be higher than without aggregation; the memory footprint is however consistently much lower.

Notes

I wrestled with this aggregation and disaggregation for a little while, and I have to say it's rather gnarly stuff.

As it stands, I think this branch could be ready for general use "at your own risk"; it could be interesting to merge it but disable the aggregation, leaving it an option for people (like me, currently) who need to cut their memory footprint, with the understanding that things could possibly go wrong and you have to double-check that the results are correct.

(Obviously a little bit of polishing is still needed, i.e. documentation, release notes, etc. etc.)

I'd also be interested in hearing about alternative approaches. The radical approach would be for pypsa to provide built-in support for build-years somehow; this would circumvent the aggregation and disaggregation implemented here but would presumably be a pretty substantial change to pypsa. Maybe it's the best solution in the long term, however.

It's also possible that there are neater ways of implementing what I did, and suggestions for improvement are welcome. In particular, it might be possible to optimise some the operations, cause they do take some time (leading to the above overhead) and it feels like it could be faster. I did a little bit of profiling and couldn't find any immediate improvements, however.

Caveats

  • The aggregation overwrites p_nom_min and p_nom_min (and e_nom_min, e_nom_max) for non-extendable components in order to set the corresponding attributes correctly for the aggregated component.
  • Different efficiencies for different build-years is something that's only dealt with in a "lossy"/approximate manner in the current implementation: the efficiencies are simply averaged (not a weighted average!) and not reconstructed upon disaggregation. Aggregating expandable components with different efficiencies cannot really be done exactly/correctly; an alternative is to only aggregate previous build years (i.e. leaving one non-expandable and one expandable component). In practice I haven't seen this issue affect the results in a significant way, but it could if for example the efficiency of some component is changed radically from one planning horizon to the next.
  • Objective values are slightly different between aggregated and non-aggregated optimisations, and I think this could be due to aggregation of capital cost. As it stands, aggregated components have to have the capital cost of the current planning horizon for the optimisation to make sense, but that historical (existing) capacities sometimes have a lower capital cost, the aggregation leads to a larger total number. This is of course "fixed" upon disaggregation since capital cost is stored in columns by build year. In my tests I see total objective differences in the order of 0.5-1%. This could have an impact on near-optimal studies etc.
  • This does not work when some locations have urban central heat buses at some planning horizons and not others, and waste heat recovery is enabled, since this means e.g. some electrolysers with different build years will have different values for bus2. To counter this, I added an option to add central heat buses everywhere, regardless of the fraction of population served by urban central heating.

Checklist

  • [x] I tested my contribution locally and it seems to work fine.
  • [x] Code and workflow changes are sufficiently documented.
  • [x] Changed dependencies are added to envs/environment.yaml.
  • [ ] Changes in configuration options are added in all of config.default.yaml.
  • [ ] Changes in configuration options are also documented in doc/configtables/*.csv.
  • [ ] A release note doc/release_notes.rst is added.

koen-vg avatar May 09 '24 12:05 koen-vg

I could also briefly explain the other changes here apart from the aggregation and disaggregation functions:

  • Of separate interest, I fixed a couple of instances of the same bug as this one: https://github.com/PyPSA/PyPSA/pull/880, in solve_network.py. (This is also the reason why this PR needs pypsa>=0.28)
  • I encountered a strange class of bugs, appearing in make_summary.py (but possibly originating earlier) where some components would "lose" their time-varying data. This only happens when those time-varying data are all 0 anyway, but for technical reasons they would be exactly 0.0 after aggregation/disaggregation, but some close-to-zero floating point value in a non-aggregated network. Is it the case that pypsa networks will automatically remove all-zero (or all-default) columns from time-varying dataframes, and this could lead to problems? I don't know, but this is the reason for the changes to make_summary.py (just a fix, not assuming the presence of said columns and using 0 by default).
  • I also somehow had problems with the reversed column in n.links, somehow leading to netCDF errors saying that bool values are not supported. This is the reason that I had to put some effort into getting rid if this column.

(The CI is btw failing cause pypsa==0.28.0 isn't quite available yet; use the latest commit instead or wait a little before testing.)

koen-vg avatar May 09 '24 12:05 koen-vg

I've also done some testing on a larger European-wide model (config pasted at the bottom of this comment) with 50 nodes and 6-hourly resolutions. The results hold at this scale as well: image In this case, the build year aggregation saves time as well as memory.

(It is curious that the optimisation at 2050 takes significantly more time than the one at 2020 even with build year aggregation; maybe this is simply down to the enforced CO2 neutrality making the solution space more "complicated"?)

In this European-wide test there were (very) minor differences in the modelling results; with build year aggregation there was some use of biogas; without there wasn't: image This could be down to an obscure bug that I didn't catch yet.

Total system costs do look very similar overall (except for biogas again): image

The above figures were produced with this configuration:

run:
  name: "mem-test"
  disable_progressbar: true
  shared_resources: false
  shared_cutouts: true

enable:
  retrieve_cutout: false

foresight: myopic

scenario:
  ll:
  - v1.5
  clusters:
  - 50
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2030
  - 2040
  - 2050

clustering:
  temporal:
    resolution_sector: "6H"
  build_year_aggregation: false

sector:
  central_heat_vent: true
  central_heat_everywhere: true
  transmission_efficiency_enabled: false

electricity:
  extendable_carriers:
    Generator: [OCGT, solar, onwind, offwind-ac, offwind-dc]
    StorageUnit: [battery]
    Store: [H2]
    Link: [H2 pipeline]

  renewable_carriers: [solar, onwind, offwind-ac, offwind-dc]

  estimate_renewable_capacities:
    enable: false

renewable:
  solar:
    cutout: "europe-2013-era5"

solving:
  options:
    io_api: "direct"
  solver:
    name: "gurobi"

koen-vg avatar May 10 '24 08:05 koen-vg

I'd be cool to see this in v0.11 :) I've taken the liberty to clean up the history of this branch a little bit, and added what I think is sufficient documentation. After also handling changing marginal cost over time, I can now barely see any difference between aggregated and non-aggregated results.

From my side this is good to go!

A couple of quick things:

  • I have no idea why the CI is still failing to set up the mamba environment; on my end it works fine to install pypsa 0.28.0. Does it have to update its repository is something?
  • Talking about tests, I think it's good to leave the build year aggregation off by default, but it might be an idea to add a test for it at some point. I know that testing as it stands is rather rudimentary but eventually it would be nice to have a proper testing suite (in a similar vain to pypsa) where it would be possible to test that build year aggregation comes similar results, for example. But any strong opinions on whether I should add a simple testing configuration for build year aggregation and add it to test.sh?
  • For right now build year aggregation is incompatible with time segmentation since the segmentation can be different between different planning horizons. I would like to fix this by "pinning" the segmentation scheme by planning horizon, but can solve that in a different PR.
  • On a vaguely related note: is there any policy on configuration validation? Would there be any interest in a simple validation script, called from the Snakefile, with a couple of asserts making sure that the configuration makes sense? It would be relevant for build year aggregation (making sure that segmentation isn't use, that transmission efficiency is turned off, etc.) but also in a bunch of other cases, e.g. https://github.com/PyPSA/pypsa-eur/issues/1016. I would be happy to contribute towards this.

koen-vg avatar May 13 '24 12:05 koen-vg

Actually sorry I take it back about this being good to go; please give me time for one more commit with a couple more fixes. Sorry I had two branches laying around and mixed them up a little.

koen-vg avatar May 13 '24 12:05 koen-vg

Okay now it's finally all good :)

koen-vg avatar May 13 '24 13:05 koen-vg

I'm actually still ironing out some small issues here and there; maybe it's an idea to still wait a little bit before merging.

koen-vg avatar May 21 '24 16:05 koen-vg

Just a few answers so far:

I encountered a strange class of bugs, appearing in make_summary.py (but possibly originating earlier) where some components would "lose" their time-varying data. This only happens when those time-varying data are all 0 anyway

I think this is a known issue: https://github.com/PyPSA/PyPSA/issues/722

I have no idea why the CI is still failing to set up the mamba environment; on my end it works fine to install pypsa 0.28.0.

The environment is cached, maybe that was the cause. Anyway, the issue seems to be resolved.

Talking about tests, I think it's good to leave the build year aggregation off by default, but it might be an idea to add a test for it at some point.

Yes, fully agree.

For right now build year aggregation is incompatible with time segmentation since the segmentation can be different between different planning horizons.

This was addressed in #1065 and should be fine now.

On a vaguely related note: is there any policy on configuration validation? Would there be any interest in a simple validation script, called from the Snakefile, with a couple of asserts making sure that the configuration makes sense?

Yes, could start with a function in _helpers.py, although I wonder how sustainable/maintained it will be.

fneum avatar May 23 '24 13:05 fneum

What about next week I write a small function in _helpers.py to validate build-year-aggregation specific configuration, just so people that want to use it don't have to track down this PR and read the listed caveats. I think using assert statements or something similar that will throw exceptions unless they are met should be effective at making sure such validation stays up to date; false positives will be discovered and have to be fixed immediately. (False negatives, i.e. illegal configuration not covered by validation, is of course the status quo anyway.)

By next week I'll have tested the build year aggregation even more myself too. It's currently working fine for me, and by then I should be even more confident that it really works.

There are no remaining outstanding issues from my side that would have to be fixed before merging.

koen-vg avatar May 23 '24 14:05 koen-vg

I hope you don't mind @koen-vg, but I want to move this feature to the v0.12.0 release given that the pending v0.11.0 release is already very big. We can do this in relatively quick succession.

fneum avatar May 24 '24 08:05 fneum

Absolutely no problem! I agree v0.11 is rather momentous already. I'll be using this feature anyway and from my side it's not much of a burden to delay the merge a little. Just get the release out the door first :)

koen-vg avatar May 24 '24 08:05 koen-vg

In my opinion, this feature is mature enough to be merged. I have tested it quite a bit, and it seems to work very well. Build year aggregation now works with the default configuration (except for having to set the central_heat_everywhere option). I have updated the documentation too, which should reflect the current state of the feature well.

The only substantial change I implemented recently for this feature is a reformulation of the solar potential constraint in order to make it compatible with build year aggregation. I hope the new formulation is easy enough to understand, and I have tested both implementations in parallel to make sure that they give identical results: image (Here, blue and orange are the old and new solar potential constraint implementations, green with build year aggregation.)

I have documented the new constraint a little bit in the commit introducing it, but in short it now constrains the total amount of solar, both existing and newly built, at any given location, not just the newly built solar capacity.

koen-vg avatar Sep 06 '24 07:09 koen-vg

Note: this feature also depends on the fix in https://github.com/PyPSA/pypsa-eur/pull/1262! That PR should be merged first, then this one.

koen-vg avatar Sep 06 '24 07:09 koen-vg

@fneum what do you think about merging this before the next release? Or is there anything that's still missing from your point of view (documentation, examples, tests, whatever)? It'll make my life a little easier if I don't have to keep this branch up-to-date :)

It's an opt-in feature so shouldn't make a difference to users. Happy to write in more detail about the implementation, pros and cons, etc. if that would be helpful.

koen-vg avatar Sep 17 '24 07:09 koen-vg

@koen-vg, I'll try to find time this week to review! Sorry that this PR hasn't been prioritized!

fneum avatar Sep 17 '24 09:09 fneum

No worries, I understand that it's hard to find time for these things!

koen-vg avatar Sep 17 '24 10:09 koen-vg

Hmm, I take it that with https://github.com/PyPSA/PyPSA/pull/1038, this implementation could be simplified a little bit, at least in that components that are to be clustered can just be marked as inactive instead of having to store information about them in ad-hoc extra columns as is currently done. I can have a look at implementing that if we want to use that new functionality, or it can also be done in a separate PR after merging this one.

koen-vg avatar Sep 23 '24 11:09 koen-vg

Just as a small update, for the record, build year aggregation still doesn't quite seem to eliminate all memory spikes during optimisation preparation. Here's an example from an optimisation, with and without build year aggregation. Note however that some components (with time-varying efficiencies) were excluded from build year aggregation for better accuracy. bilde While the advantage of build year aggregation is clear, even when aggregated there is still a memory spike up to about 33GB, whereas the build of optimisation "only" takes less than 25GB. Again, in a SLURM (or similar) setup, this leads to a ~35% memory overhead on individual optimisation jobs. Better than the 100%+ overhead without build year aggregation of course, and build year aggregation also speeds up the optimisation. But the problem of memory spikes during LP building shouldn't have to be solved by careful model set-up.

(P.S. note to anyone interested in replicating these graphs: you will have to set the memory polling interval to something less than 30 seconds, the default, otherwise the spikes won't be visible.)

koen-vg avatar Oct 07 '24 09:10 koen-vg

Okay I've refactored the implementation, using the new active attribute from pypsa 0.31. It's actually a lot more readable now, I think!

I recommend using the following minimal configuration to test the implementation for anyone that's interested:


run:
  name: "buildyearagg"

foresight: myopic

scenario:
  ll:
  - vopt
  clusters:
  - 45
  opts:
  - ""
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2030
  - 2040
  - 2050

sector:
  central_heat_everywhere: true

clustering:
  temporal:
    resolution_sector: "168H"

solving:
  solver:
    name: gurobi
    options: gurobi-default

Then run the workflow up to plot_summary and compare the energy balances, etc. There are some very minor differences in the modelling results (I think due to efficiencies changing over time; they are averaged when aggregating over build-years) but overall you obtain good accuracy. Certainly the trade-off is more than worth it compared to decreasing spatio-temporal resolution.

koen-vg avatar Oct 11 '24 11:10 koen-vg

Now, the problem is that, while correct, the way that the active attribute is implemented means that the current build year aggregated doesn't actually reduce memory overhead properly! That is because it seems like optimisation variables are created for components even if active is set to False. Properly excluding components that are not active is mentioned in https://github.com/PyPSA/PyPSA/pull/1038, but implementation hasn't been discussed further and will presumably only happen after https://github.com/PyPSA/PyPSA/issues/848. So if we want to keep the implementation of build-year aggregation as is, it will only become really useful after the active attribute is taken "more seriously" by PyPSA.

Anyone who still wants the memory benefits could for now use https://github.com/PyPSA/pypsa-eur/pull/1056/commits/f4b208cf3ae137c0ccc25c5735d0acc60e4d448c.

koen-vg avatar Oct 11 '24 11:10 koen-vg

I did manage to fully exclude inactive components from model building in https://github.com/PyPSA/PyPSA/commit/094af75bf5d2f31ccfca39b8b4644e32a25bfb53; that's a pretty quick-and-dirty implementation but shows that it works in principle. The results are good: memory is back down to where it's supposed to be.

koen-vg avatar Oct 11 '24 13:10 koen-vg

Hey @koen-vg, sorry that I still did not get to this PR :/ I've just been running out of time. I'll be on a 2-week holiday now, but I will prioritize this after I return.

fneum avatar Nov 14 '24 19:11 fneum

No worries at all @fneum. It anyway looks like the utility of this PR will be limited until https://github.com/PyPSA/PyPSA/issues/1056 gets fixed. That is, the improved implementation using the active attribute is much neater, but doesn't actually improve memory usage (which is supposed to be the whole point) as of now since non-active components are still included in model building. So I wouldn't merge this quite yet.

koen-vg avatar Nov 15 '24 07:11 koen-vg