physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🐛[BUG]: ERA5 dataset_download example fails

Open negedng opened this issue 1 year ago • 5 comments

Version

0.7.0

On which installation method(s) does this occur?

Docker

Describe the issue

I am trying to follow the dataset download example from https://github.com/NVIDIA/modulus/tree/main/examples/weather/dataset_download but it fails with chunk key error

There was a CDS update so might be a change in the API.

seems like time is replaced with valid_time

Minimum reproducible example

python start_mirror.py

Relevant log output

root@be1b9fafbee5:/data/codes/modulus/examples/weather/dataset_download# python start_mirror.py
Downloading data for 1980-1
[                                        ] | 0% Completed | 64.96 s                                                                                                                                         
Error executing job with overrides: []
Traceback (most recent call last):
  File "/data/codes/modulus/examples/weather/dataset_download/start_mirror.py", line 51, in main
    zarr_paths = mirror.download(cfg.variables, date_range, hours)
  File "/data/codes/modulus/examples/weather/dataset_download/era5_mirror.py", line 305, in download
    dask.compute(*tasks)
  File "/usr/local/lib/python3.10/dist-packages/dask/base.py", line 665, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/data/codes/modulus/examples/weather/dataset_download/era5_mirror.py", line 209, in download_and_upload_chunk
    ds = ds.chunk(chunking)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 2726, in chunk
    raise ValueError(
ValueError: chunks keys ('time',) not found in data dimensions ('valid_time', 'latitude', 'longitude')

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment details

modulus 0.7.0 with docker 24.07
cds-beta

negedng avatar Aug 06 '24 13:08 negedng

Yes I had the same issue. Looking at the error message you should replace in the era5_mirror.py line 204 "time" with "valid_time". Doing thid I get file in the zarr_data folder.

elasto avatar Oct 03 '24 13:10 elasto

Yes, I've done the same

negedng avatar Oct 03 '24 13:10 negedng

Anyway I experience a segmentation fault during the download when I try to run the star_mirror.py for the config_34var.yaml

elasto avatar Oct 08 '24 12:10 elasto

try it again, it saves parts of the download. I think connection issues are not handled well. I've managed to download and I had to restart it a few times. Alternatively, you can try the stuff here: https://github.com/NVIDIA/modulus/tree/main/examples/weather/unified_recipe

Let me know if you have a running training code because I'm still struggling on a later step:)

negedng avatar Oct 08 '24 12:10 negedng

try it again, it saves parts of the download. I think connection issues are not handled well. I've managed to download and I had to restart it a few times. Alternatively, you can try the stuff here: https://github.com/NVIDIA/modulus/tree/main/examples/weather/unified_recipe

Let me know if you have a running training code because I'm still struggling on a later step:)

Did you replace everywhere time with valid_time ? Or you patched as well some part of the files {era5_mirror.py, star_mirror.py} ? I am still figthing with the download and I get an other error message: ValueError: append_dim='valid_time' does not match any existing dataset dimensions {}

elasto avatar Oct 09 '24 08:10 elasto

Hey @elasto and @negedng,

Sorry for the late reply on this. I would recommend the download and curation scripts in the unified recipe if possible. That uses ARCO ERA5 to get the data and is much easier then using the CDS API. Ill take a look today again at the dataset_download folder to see if I can replicate the error. For me the CDS API seemed to completely stop working last month though so not sure whats up with that.

Ill mention that we are working on some much better utils for this in Earth2Studio. There is a very rough prototype here, https://github.com/loliverhennigh/earth2studio/blob/arco_caching/examples/09_build_dataset.py. This uses apache beam and checkpoints the download making it much easier. We don't have an ETA on this yet but there has been a lot of discussions for the best way to do this. Hopefully

Did you have issues with training using the unified recipe as well?

loliverhennigh avatar Oct 15 '24 18:10 loliverhennigh

I found that "time" needs to be changed to "valid_time" in:

  • era5_mirror.py: lines 204 and 214
  • start_mirror.py: lines 58, 69, and 75

Also for segmentation fault, this should be because none of the h5 files are being closed - see https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets. If this issue could be fixed that'd be great.

dyeosy98 avatar Jan 09 '25 06:01 dyeosy98

Addressed by https://github.com/NVIDIA/physicsnemo/pull/845

pzharrington avatar May 07 '25 19:05 pzharrington