atlite icon indicating copy to clipboard operation
atlite copied to clipboard

Atlite ESGF interface for downloading and preparing CMIP6 data

Open Ovewh opened this issue 3 years ago • 10 comments

Change proposed in this Pull Request

Add a interface in atlite to the ESGF CMIP database for downloading and preparing CORDEX and CMIP6 data.

Description

An interface in atlite for working with Climate model output have been developed. There is an example on how to use this interface available in examples/cmip_interface_example.ipynb. The search parameters for the ESGF database can be specified as either as dictionary when setting up the cutout or in the atlite/datasets/cmip.ymal file. Determining the search parameters have to be done manually by searching the ESGF database.

Variables required by atlite are:

  • rsds, surface downwelling radiation shortwave
  • rsus, surface upwelling radiation shortwave
  • sfcWind, wind speed 10m
  • mrro ,runoff
  • tas, surface temperature

This similar to the variables required by the old CORDEX interface, however surface roughness doesn't seem to be available from CMIP. Based on the provided search parameters, it uses the pyesf-search python api to find matching results, if there are more than one result it will take the most resent result. Then the OPeNDAP urls for that result are obtained and which can be loaded lazily using xarray. This means that the data can be subset according to the cutout and the download and computation will be triggered by cutout.prepare(). Be aware that some models and dataservers doesn't provide OPeNDAP urls, which means that you might have to try different ensamble to find a model that has. The current example uses data from the EC-Earth3 model. There is also a possibility to download netCDF files with 1 year of data individually, however this haven't been implemented in atlite. That might be more robust, but atleast so far the OPeNDAP interface in xarray have been working flawlessly.

Caveats:

The highest temporal resolution that are available CMIP is 3hr, however some models only have some of variables required by atlite at 3hr resolution while others are at 6hr resolution. CMIP6 also have quite coarse resolution ~ 100km, CORDEX has higher resolution. The surface roughness is not available in CMIP, currently averaged roughness is taken from ERA5.

Motivation and Context

Explore influence of future climate change on energy systems. Related issue #59

How Has This Been Tested?

The functionally have been tested for calculating wind and pv capacities. Tested with python > 3.9.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • [x] I tested my contribution locally and it seems to work fine.
  • [x] I locally ran pytest inside the repository and no unexpected problems came up.
  • [x] I have adjusted the docstrings in the code appropriately.
  • [ ] I have documented the effects of my code changes in the documentation doc/.
  • [x] I have added newly introduced dependencies to environment.yaml file.
  • [ ] I have added a note to release notes doc/release_notes.rst.
  • [ ] I have used pre-commit run --all to lint/format/check my contribution

Ovewh avatar Jun 29 '21 12:06 Ovewh

Reuse compliance requires a comment with the license at the beginning of each new file (can just be copied from the other .py/.yaml files) Could you also merge the up-to-date master so that tests run through?

FabianHofmann avatar Jun 29 '21 13:06 FabianHofmann

@FabianHofmann Something I would like your thoughts on. As I mentioned the surface roughness isn't available from CMIP, I have yet to address this issue in my code. My idea is to just have a keyword argument path with some external roughness dataset, however I could also make atlite prepare a static roughness dataset from ERA5? Atleast requiring the path to roughness dataset to be provided during the creation of the cutout would avoid any confusion of the data source.

Ovewh avatar Jun 29 '21 18:06 Ovewh

@Ovewh for retrieving features from other sources one has to add the module to the cutout. But: The surface roughness is only used for extrapolating the wind speed to the turbine hub height. But the ESGF can retrieve wind speed at arbitrary heights right?

FabianHofmann avatar Jul 05 '21 09:07 FabianHofmann

@Ovewh for retrieving features from other sources one has to add the module to the cutout. But: The surface roughness is only used for extrapolating the wind speed to the turbine hub height. But the ESGF can retrieve wind speed at arbitrary heights right?

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Ovewh avatar Jul 05 '21 13:07 Ovewh

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Okay then the roughness data has to come from the era5 dataset. atlite allows to mix datasources. So the best way would be to retrieve all variable from ESGF and fill up with era5 data which is principally done with

cutout = atlite.Cutout('my_cutout', module=['esgf', 'era5'], time=....)

Then it will retrieve all availabe features from esgf and the fill up missing variables (in that case the roughness data) from era5. Could you try that out?

FabianHofmann avatar Jul 08 '21 12:07 FabianHofmann

No, only a few models provide wind speed at 100m, most models provide only the surface wind speed. So the windspeed has to be extrapolated.

Okay then the roughness data has to come from the era5 dataset. atlite allows to mix datasources. So the best way would be to retrieve all variable from ESGF and fill up with era5 data which is principally done with

cutout = atlite.Cutout('my_cutout', module=['esgf', 'era5'], time=....)

Then it will retrieve all availabe features from esgf and the fill up missing variables (in that case the roughness data) from era5. Could you try that out?

@FabianHofmann Yes ,so the issue is that CMIP contains future climate projections, and ERA5 is a reanalysis. It only makes sense to take the averaged roughness from ERA5, either based on one year or a single month. I did some sensitivity tests calculating capacity factors using constant and forecasted roughness for ERA5. There where only a slight difference in the offshore capacities.

Ovewh avatar Jul 08 '21 12:07 Ovewh

Let's also have a look at https://py-cordex.readthedocs.io/en/stable/index.html

FabianHofmann avatar Dec 10 '21 15:12 FabianHofmann

@FabianHofmann It doesn't look like py-cordex have an interface for downloading data, but I did not work with the CORDEX data.

The first attempt on creating a CMIP interface I made turned out to be bit of a dead end. Integrating downloading of the CMIP data directly in atlite did not work out that well, since the CMIP datafiles are formated slightly different from model to model (e.g. some models provide yearly files or 10 years in one file, and then the models also use different calendars). It is probably simpler and more robust to make a very general interface for sideloading locally stored climate and weather data into atlite. Then it would be up to the user to preprocess the data into a format that atlite can understand.

Ovewh avatar Dec 10 '21 17:12 Ovewh

It is probably simpler and more robust to make a very general interface for sideloading locally stored climate and weather data into atlite. Then it would be up to the user to preprocess the data into a format that atlite can understand.

Interesting! This would be similar to what we have for the SARAH2 dataset (cutout(...) get's called with an additional argument sarah_dir pointing to a local directory containing the manually downlaoded SARAH2 data due to a lack of API.

(Just a comment)

Outsider question (I'm not familiar with CMIP/COREDEX datasets): Is there like a central repository from which one can manually downloaded the data?

euronion avatar Dec 12 '21 17:12 euronion

Interesting! This would be similar to what we have for the SARAH2 dataset (cutout(...) get's called with an additional argument sarah_dir pointing to a local directory containing the manually downlaoded SARAH2 data due to a lack of API.

Yes, that's my idea, though perhaps even more general, instead of path, it would be a xarray.Dataset.

Outsider question (I'm not familiar with CMIP/COREDEX datasets): Is there like a central repository from which one can manually downloaded the data?

Yes, all the CMIP6/CORDEX data is stored at ESGF data nodes. It provides different ways of downloading the data e..g. OPeNDAP and wget scripts.

Ovewh avatar Dec 12 '21 20:12 Ovewh