pypsa-eur icon indicating copy to clipboard operation
pypsa-eur copied to clipboard

Sweep across multiple weather years

Open fneum opened this issue 3 years ago • 10 comments

Closes #200 .

Changes proposed in this Pull Request

  • Adds a new wildcard {weather_year} to sweep through particular weather years (e.g. networks/elec{weather_year}_s{simpl}_{clusters}.nc). In design similar to {simpl} wildcard so that it is optional (e.g. networks/elec_s{simpl}_{clusters}.nc still possible). {weather_year} overwrites snapshots: configuration, assuming full year.
  • Hydro generation not available for all years ERA-5 provides (only 1980-2021). Added option to specify fixed norm year renewables: hydro: norm_year: 2013. Falls back to median of available years if no norm year specified and weather year not in hydro generation data. The option exists to approximate missing years in proportion to runoff.
  • Hydro generation data scaled by installed capacity to eliminate changes in generation due to newly built capacity.
  • Load time series also not available for all years e.g. ERA-5 provides. Added option to specify fixed load year. Falls back to fallback year specified in config.yaml at load: fixed_year: 2013. @martacki implements some synthetic load time series for missing years.
  • Added option to drop February 29 in leap years, so that we always have 8760 hours: enable: drop_leap_days: false, defaults to false.

How To Use

In config.yaml change the following:

scenario:
  year: [1994,2000,2013]

enable:
  retrieve_cutout: false

atlite:
  cutouts:
    mycutout:
        ...

renewable:
  onwind:
    cutout: "mycutout-{weather_year}" # with wildcard!
  # likewise for other carriers

Then shoot snakemake -j 99 solve_all_networks.

If you want to use PyPSA-Eur as before, without the {weather_year} wildcard, in config.yaml set:

scenario:
  weather_year: ['']

Checklist

  • [ ] I tested my contribution locally and it seems to work fine.
  • [ ] Code and workflow changes are sufficiently documented.
  • [ ] Newly introduced dependencies are added to environment.yaml and environment.docs.yaml.
  • [ ] Changes in configuration options are added in all of config.default.yaml, config.tutorial.yaml, and test/config.test1.yaml.
  • [ ] Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • [ ] A note for the release notes doc/release_notes.rst is amended in the format of previous release notes.

fneum avatar Oct 20 '20 12:10 fneum

Leap years! Set as early as base_networks?

Option A: Ignore/cut leap day.

def cut_leap_days(n):
    if not n.snapshots.is_leap_year.any(): return
    snapshots = pd.DatetimeIndex([sn for sn in n.snapshots if (sn.month != 2) | (sn.day != 29)])
    n.set_snapshots(snapshots)
    n.snapshot_weightings[:] = 8760/len(n.snapshots)

Option B: Accept lower snapshot weightings (e.g. 2.99 instead of 3.0).

[EDIT: but for option B need to consider that we fall back to non-leap years for load and hydro total annual generation, also in leap years data on Feb 29 might be missing (e.g. load)]

fneum avatar Nov 17 '20 16:11 fneum

  • [x] Needs update to new atlite version, see #224 .
  • [x] Needs disentanglement of all the different features changed into separate PRs.

euronion avatar Apr 27 '21 11:04 euronion

Currently, when executing with eg. weather_year=2014, the workflow by default still retrieves the 2013 cutout(s) from zenodo. Once LSDF @ KIT starts working again, I have several cutouts prepared (not all, but I think 2000-2020) for ERA5 data, including Ukraine. We could maybe upload them to zenodo, as well? And adapt then the retrieve_cutout function to download the specified weather year? I'm happy to prepare additional ones, too. For load data, there are no electricity demand time-series for years before 2005 available on OPSD, but I did some work on generating synthetic ones from an existing evaluated package. However, still need to finish that. If that works out, we could also provide these csv files on zenodo, maybe?

Alternatively (if we don't want to upload), there could be an error popping up if the default configuration is specified true in the config.yaml, but weather_year isn't 2013?

martacki avatar Jun 16 '22 14:06 martacki

Wenrui has created cutouts for all years from 1980 I think for ERA5 and Sarah with the new atlite, they are on the LSDF and can be uploaded to Zenodo

lisazeyen avatar Jun 16 '22 15:06 lisazeyen

But excluding Ukraine and Moldova, no? @lisazeyen

martacki avatar Jun 16 '22 15:06 martacki

But excluding Ukraine and Moldova, no? @lisazeyen

Ah yes without Ukraine and Moldova!

lisazeyen avatar Jun 17 '22 06:06 lisazeyen

Currently, when executing with eg. weather_year=2014, the workflow by default still retrieves the 2013 cutout(s) from zenodo. Alternatively (if we don't want to upload), there could be an error popping up if the default configuration is specified true in the config.yaml, but weather_year isn't 2013?

That is sort of intended. I wanted full backwards compatibility so that the {weather_year} wildcard would stay empty by default. To use the wildcard, more changes, e.g. how cutouts are referenced in the configy.yaml are necessary.

It's a good idea to check the configuration and provide warnings.

Once LSDF @ KIT starts working again, I have several cutouts prepared (not all, but I think 2000-2020) for ERA5 data, including Ukraine. We could maybe upload them to zenodo, as well? And adapt then the retrieve_cutout function to download the specified weather year? I'm happy to prepare additional ones, too.

That might be a good idea. It's a lot of data, but downloading it through the CDSAPI takes very long. I would propose that each year would get an own zenodo repository in this case. Complete time frame 1959 - 2021 would be good (it's at least what I intended to use in the near term).

For load data, there are no electricity demand time-series for years before 2005 available on OPSD, but I did some work on generating synthetic ones from an existing evaluated package. However, still need to finish that. If that works out, we could also provide these csv files on zenodo, maybe?

Depending on size, CSV could go into the Github repository or the data bundle zenodo repository. I think it's an equally valid approach to fix the electricity demand profiles to a single year when evaluating the impact of weather years. At least one needs to detrend the available electricity demand profiles across the years.

Wenrui has created cutouts for all years from 1980 I think for ERA5 and Sarah with the new atlite, they are on the LSDF and can be uploaded to Zenodo

That's good to know. SARAH doesn't have high priority for me currently, since it's only available for 1983 onwards, and I want to do analysis for 1959 onwards.

fneum avatar Jun 20 '22 08:06 fneum

That is sort of intended. I wanted full backwards compatibility so that the {weather_year} wildcard would stay empty by default. To use the wildcard, more changes, e.g. how cutouts are referenced in the configy.yaml are necessary.

It's a good idea to check the configuration and provide warnings.

For backward compatibility, wouldn't it be better if the weather_year wildcard was '2013' by default? Otherwise, when allowing any choise, it's not clear which weather year is being used? Or you you mean so that the filenames are all the same (elec.nc, instead of elec2013.nc?)

martacki avatar Jun 20 '22 12:06 martacki

No, always having a non-empty {weather_year} wildcard limits our options. The regular way to define the temporal scope of the model is through the config.yaml. This is much more flexible: you could set it to a period from July to June or to one covering two years at once. These things can't be handled by the {weather_year} wildcard.

That is why I would like to make the {weather_year} wildcard really optional and document clearly what has to be changed in the config.yaml to sweep across years.

fneum avatar Jun 20 '22 12:06 fneum

Hi, as hinted at earlier I've worked with Aleksander on a solution to this too, which might give another useful perspective. You can find the code at https://github.com/koen-vg/pypsa-eur/tree/maa-v1; it is used in a larger workflow at https://github.com/aleks-g/intersecting-near-opt-spaces/.

We have tried documenting our approach and modifications to PyPSA-Eur as well as possible. I don't have too much time to look into this right at the moment (after the summer will be better), but the commits in my fork of PyPSA-Eur should be nicely organised. The most important to look at is https://github.com/koen-vg/pypsa-eur/commit/1dce9b0056d9ad409529d17e925aeeca7cd62281. Some differences that I can see with this pull request:

  • In our implementation we allowed the {year} wildcard to represent a range or set of years of the form "1980-2020", facilitating runs over periods of time longer than a year. That might actually not be too hard to include in something like this pull request.
  • Keep in mind that time-dependent data is used in the network clustering procedure. If the network topology should be fixed between different weather years, then a separate "constant" network needs to be used to define the clustering (see https://github.com/koen-vg/pypsa-eur/commit/c6c6f80137f640644767a90b5ff326e125f704bc).
  • We also implemented an option to start each year at a user-specified date and not necessarily the 1st of January; this open up the possibility to preserve the continuity of meteorological winters and we have seen that this can have an impact. (See again https://github.com/koen-vg/pypsa-eur/commit/1dce9b0056d9ad409529d17e925aeeca7cd62281.)
  • Beyond ERA5 data (for which we can share the cutouts, but again they don't include Ukraine) we also did some work on load and hydro data: see https://github.com/aleks-g/multidecade-data.

We had a bit of an easier time because we didn't focus on backwards-compatibility in our implementation.

After the summer (August) I would be happy to comment further on this and/or looking at whether some of the above could be useful for upstream PyPSA-Eur.

koen-vg avatar Jun 24 '22 13:06 koen-vg

I can see that there is still some work to do in polishing this PR, but I would love for it to be merged :) I just spent a little bit of time merging in the latest changes from the master branch and fixing some existing bugs, and for me things are now running: do check out and consider merging my multiyear-fixed branch! https://github.com/koen-vg/pypsa-eur/tree/multiyear-fixed. Hopefully this saves some work; most of it was just adapting to the new scenario management etc.

I can't say that I've tested everything, but a simple overnight sector-coupled configuration does run all the way to plot_summary which is a good start.

I've made a couple of functional changes while merging which I think make sense, but of course feel free to reconsider:

  • The build_ship_raster and build_natura_raster rules now take only the "default" atlite cutout as input in order to determine extent/bounds in x/y coordinates. Note that the convention is to specify a cutout including weather year (i.e. europe-era5-2013 as the default cutout.
  • Cutouts for heat demand and solar thermal, on the other hand, shouldn't use the (fixed in time) atlite default cutout and should rather use the {weather_year} wildcard. So cutout inputs to the respective rules now use an input function and find the correct cutout based on two newly added config keys: solar_thermal: cutout and sector: heat_demand_cutout; these work the same as the renewables: {tech}: cutout options but can also be set to "default" at which point the input function falls back on the default atlite cutout.

Apart from that I've tried to stay very close to the "original" multiyear implementation. It looks like processing of eurostat data is maybe the biggest open question. For anyone trying to work with this branch themselves, you need to manually place the eurostat-energy_balances-june_2021_edition directory in data, and I had this laying around from earlier by forgot from where I got it. Presumably how this is dealt with might change anyway as there's another open PR for updating the eurostat dataset.

koen-vg avatar Mar 01 '24 15:03 koen-vg

Thanks @koen-vg, will try this week.

One discussion we recently had internally was that we could use the new scenario management with the scenarios.yaml instead of introducing a weather year wildcard.

Do you have an opinion about this?

fneum avatar Mar 03 '24 13:03 fneum

Interesting idea! All in all, I think I like it; it seems like it would be a bit easier to implement than a new wildcard, or at least it would make the diff smaller especially with regards to snakemake rules. It could avoid having the annoying boilerplate-like blocks of the form if snakemake.wildcards.weather_year: [...] else: [...] in the scripts; it would be nice to have config["snapshots"] be the definitive and only source of truth on weather/physical time horizon of the network.

Of course it's maybe not quite as convenient when you want to run things over tens of weather years, but in practice I think the most useful application is anyway to work with only a few selected weather years, and either include them as scenarios or run each of your scenarios with a couple of weather years, depending on the application. In that case, just creating a new scenarios/run for each weather year should be fine. There wouldn't be much overhead in disk-space since almost everything in resources depends on the weather year anyway.

Just a consideration: I think it would be great if you could refer to a cutout name from the configuration, and that there's then some magic in an input function somewhere to "translate" this to a cutout for the right year. Basically like how you can specify renewables: <tech>: cutout: "europe-era5-{weather_year}" in the current multiyear branch. Maybe it shouldn't be {weather_year} to avoid confusion, but maybe something like "europe-era5-YYYY", and then an input function a little like:

def cutout_input(wildcards):
    cutout = config_provider("renewables", wildcards.tech, "cutout")(wildcards)
    if "YYYY" in cutout:
        cutout = cutout.replace("YYYY", pd.to_datetime(config_provider("snapshots", "start")(wildcards)).year)
    return cutout

Something like this could be used for for renewable capacity factor time series, heat demand and solar thermal profiles.

On other note, I would love to be to summary plots across weather years using the plot_summary rule, which, if I understand it correctly, doesn't currently collect results from across scenarios. But one might want to do this kind of plotting across scenarios anyway and so that's a question that's almost independent from how to implement weather years.

Finally, I think it would be great to eventually support optimisation over multiple weather years at once, but this might actually somehow be easier with a scenario-based approach. In that, multi-year optimisations could be implemented only through collecting the right cutouts in the right input functions, but no messing with potentially different formats of {weather_year} wildcards etc. It would also be a dream to be able to do 1-year optimisations while preserving meteorological winters easily, and in the best case that would only be a case of simply specifying something like start: "2013-07-01", end: "2014-07-01" in config["snapshots"], and this can also be dealt with using some input function magic. Both of these use-cases do seem to reflect that it would be best to have the snakemake workflow and input functions determine which cutout is used depending on config["snapshots"], and not hardcode this in config["renewables"][<tech>]["cutout"] beyond which family of cutouts you want to use (i.e. SARAH vs. ERA5).

koen-vg avatar Mar 04 '24 12:03 koen-vg

I am quite happy with the removal of the {weather_year} wildcard. The PR is now in a state that works for different single years and reproduces results for the default year 2013 locally. There are some additional features one could add, but I think it would be best to do that step by step in separate PRs (wishlist at the top).

Thanks, @koen-vg, for the transition to scenario management. It was very fast and easy to implement, and it saved a lot of time!

Just a consideration: I think it would be great if you could refer to a cutout name from the configuration, and that there's then some magic in an input function somewhere to "translate" this to a cutout for the right year.

The create_scenarios.py snippet mentioned at the top of the PR (updated now) somewhat covers this.

On other note, I would love to be to summary plots across weather years using the plot_summary rule, which, if I understand it correctly, doesn't currently collect results from across scenarios.

Yes, that's the plan, but for a separate PR.

Finally, I think it would be great to eventually support optimization over multiple weather years at once, but this might actually somehow be easier with a scenario-based approach.

Yes, that's a feature I would also want, but I would propose this for a separate PR after this one (just to keep the changes somewhat digestible). It requires a rule to merge cutouts, but I think you already implemented this in https://github.com/aleks-g/intersecting-near-opt-spaces/, right? In the scripts, there shouldn't be too many hurdles as far as I can see. I know of one I built on purpose for now in pop_weighted_energy_totals, which should be straightforward to resolve.

fneum avatar Mar 14 '24 17:03 fneum

Thanks, @koen-vg, for the transition to scenario management. It was very fast and easy to implement, and it saved a lot of time!

Thanks, glad to know that I could help a little!

Just a consideration: I think it would be great if you could refer to a cutout name from the configuration, and that there's then some magic in an input function somewhere to "translate" this to a cutout for the right year.

The create_scenarios.py snippet mentioned at the top of the PR (updated now) somewhat covers this.

Yes, this looks nice to me; if the scenario system is used anyway to select the weather year, then it seems very sensible to use it also to select the correct cutout for the solar thermal etc. cutouts instead of adding extra complexity in input functions. One could always think about streamlining these things but I 100% agree that these "nice-to-have"s should be left for future PRs.

Finally, I think it would be great to eventually support optimization over multiple weather years at once, but this might actually somehow be easier with a scenario-based approach.

Yes, that's a feature I would also want, but I would propose this for a separate PR after this one (just to keep the changes somewhat digestible). It requires a rule to merge cutouts, but I think you already implemented this in https://github.com/aleks-g/intersecting-near-opt-spaces/, right? In the scripts, there shouldn't be too many hurdles as far as I can see. I know of one I built on purpose for now in pop_weighted_energy_totals, which should be straightforward to resolve.

Yes, the merging of cutouts was quite simple, see https://github.com/koen-vg/pypsa-eur/blob/7ea264f2744a8bd165f9301f23dfc0e2c4b86c6a/scripts/build_renewable_profiles.py#L220-L224 (from the pypsa-eur commit included in the repo you linked). It's a bit quirky that we found we had to supply a filename to the atlite.Cutout constructor and used a named temporary file for this, but it should never be written to. Maybe there is a better way, but it did work fine.

I can see if I have any time going forward to open a PR for this. The approach I've used before is to write an input function for the relevant rules (mainly build_renewable_profiles and siblings) which determines which cutouts are needed (based in this case on the snapshot config section), and then merge them inside the build_renewable_profiles rule. But that doesn't mesh very well with the current approach of using the config to manage which cutout is for which year. I can see that one could write a rule to produce cutouts of the form europe-era5-2010-2020 and then put that in the config, but I will say that this would take quite a lot of disk space. Not sure if there's a good way around that. Just some loose thoughts, should maybe be discussed elsewhere anyway.

I'm looking forward to seeing this merged; I definitely agree it makes sense to get the basic functionality merged at first and worry about additional features in later PRs. I'll be using the multiyear functionality quite actively in the next couple of months and I'll be sure to nail down and fix any potential issues I find on the way. Thanks so much for the amazing work!

koen-vg avatar Mar 15 '24 10:03 koen-vg

Wow! Great job @fneum and @koen-vg !

FabianHofmann avatar Mar 15 '24 15:03 FabianHofmann

only after looming around for 3.5 years (Oct 20, 2020) ;)

fneum avatar Mar 15 '24 15:03 fneum