atlite icon indicating copy to clipboard operation
atlite copied to clipboard

Changed Dataformat from ERA5 Single Levels: .nc to .zip

Open TimFuermann opened this issue 1 year ago • 2 comments

Version Checks (indicate both or one)

  • [x] I have confirmed this bug exists on the lastest release of atlite.

  • [X] I have confirmed this bug exists on the current master branch of atlite.

Issue Description

Downloading era5 single-level-values from CDS has changed. In detail, the selection "data-format": "netcdf" returns a .zip file with multiple single .nc files by default, in case multiple fields are downloaded at once.

I suggest changing the retrieve_data function in the era5.py dataset, to per default download a .zip file and combine all .nc files within it directly.

The changes are linked in a few minutes.

Reproducible Example

import atlite

cutout = atlite.Cutout(
    path="western-europe-2011-01.nc",
    module="era5",
    x=slice(-13.6913, 1.7712),
    y=slice(49.9096, 60.8479),
    time="2011-01",
)

cutout.prepare()

Expected Behavior

The expected behavior should be to return the cutout, instead the code fails, with an value error (see below). The value comes from the fact, that we try to load a .zip file as a .nc file and need to dezip it beforehand.

*** ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'scipy', 'rasterio']. Consider explicitly selecting one of the installed engines via the `engine parameter, or installing additional IO dependencies, see: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html https://docs.xarray.dev/en/stable/user-guide/io.html

Installed Versions

Local version of the master branch. Added: IO Backend for Xarray -> ['h5netcdf']

TimFuermann avatar Dec 03 '24 15:12 TimFuermann

@TimFuermann, do you have an MWE which fails? With a fresh install today, the reproducible example worked without problems.

fneum avatar Jan 02 '25 13:01 fneum

@fneum, I just tested this with a fresh installation, and it now works fine for me. However, I still feel that the current method of requesting data via the CDS API is not up-to-date and could potentially fail in the future (depending on how consistently CDS/ECMWF handles API requests).

Here is one of the five final requests generated by running the MWE (which unfortunately works) provided above:

grafik

As you can see, the request appears incomplete. Some major parts of the request, such as specifying the data format as .nc, are missing. Nevertheless, when processing the request, these fields seem to be filled in either by default or based on other parameters. You can see this in the next image:

grafik

Even here, the data format is not the requested NetCDF files. Instead, the data is obtained as GRIB files.

The reason why this is now working puzzles me. There have been recent discussions in the ECMWF forum about the exact problem I mentioned earlier: https://forum.ecmwf.int/t/cdsapi-reanalysis-single-level-is-being-download-as-a-zip-when-asked-for-unarchived/10172

The issue seems to be related to the following update: https://forum.ecmwf.int/t/forthcoming-update-to-the-format-of-netcdf-files-produced-by-the-conversion-of-grib-data-on-the-cds/7772

In my opinion, it would be worth updating the request syntax (similar to what I suggested earlier) to align with the latest CDS API standards. This would ensure that the requests are up-to-date and allow us to verify online if the request is truly as intended.

TimFuermann avatar Jan 07 '25 16:01 TimFuermann

Hi @TimFuermann ,

I think we can close the issue, since we have updated the request to the CDS API in #439 . Would you agree?

euronion avatar Jul 15 '25 15:07 euronion

@euronion,

looking at your changes, I would say: Yes, if GRIB download is now the preferred choice of downloading (I think this is anyways safer), than the issue can be closed. Nevertheless, I would suggest to make certain that the passed variables are in the correct form as the CDS-API requires them (this was not always the case in previous version, in particular for time variables and lead once to issues during this whole migration process of CDS). This is just to avoid issues in future, in case there are changes on the CDS side.

TimFuermann avatar Jul 16 '25 07:07 TimFuermann

Thanks! Yes, I agree on monitoring it more closely. Due to circumstances I can also now keep a closer eye on the CDS API and potential changes.

If you spot anything again, please feel free to raise another issue! Thanks for your responses and contributions! :)

euronion avatar Jul 16 '25 09:07 euronion