🐛[BUG]: GraphCast example with syntetic dataset requires ERA5 metadata
Version
0.7.0
On which installation method(s) does this occur?
Docker
Describe the issue
Hi,
I started a new project with the GraphCast example. I wanted to test it on the synthetic dataset before downloading the ERA5 data, but it turned out that the loss.py requires missing metadata from ERA5.
I installed Modulus from docker (24.07), installed missing mlflow with pip, changed the num_samples_per_year_train to 1 to fit into the memory, and started training with the synthetic dataset python train_graphcast.py synthetic_dataset=true.
It asks for metadata from the ERA5 dataset.
Minimum reproducible example
python train_graphcast.py synthetic_dataset=true
### Relevant log output
```shell
root@be1b9fafbee5:/data/codes/modulus/examples/weather/graphcast# python train_graphcast.py synthetic_dataset=true
/usr/local/lib/python3.10/dist-packages/modulus/distributed/manager.py:346: UserWarning: Could not initialize using ENV, SLURM or OPENMPI methods. Assuming this is a single process job
warn(
[11:10:46 - main - INFO] Rank: 0, Device: cuda:0
[11:10:46 - main - WARNING] Using Dummy dataset. Ignoring static dataset, cosine zenith angle, time of the year, and history. Also setting num_workers to 0.
[11:10:47 - main - INFO] Using torch.bfloat16 dtype
[11:10:47 - main - WARNING] Static dataset path is not provided. Setting num_channels_static to 0.
[11:10:57 - main - INFO] Model parameter count is 35296329
Generated synthetic temperature data in 4.07 seconds.
[11:11:02 - main - INFO] Loaded training datapipe of size 0
Error executing job with overrides: ['synthetic_dataset=true']
Traceback (most recent call last):
File "/data/codes/modulus/examples/weather/graphcast/train_graphcast.py", line 349, in main
trainer = GraphCastTrainer(cfg, dist, rank_zero_logger)
File "/data/codes/modulus/examples/weather/graphcast/train_graphcast.py", line 211, in __init__
self.criterion = GraphCastLossFunction(
File "/usr/local/lib/python3.10/dist-packages/modulus/utils/graphcast/loss.py", line 129, in __init__
self.channel_dict = self.get_channel_dict(dataset_metadata_path, channels_list)
File "/usr/local/lib/python3.10/dist-packages/modulus/utils/graphcast/loss.py", line 173, in get_channel_dict
with open(dataset_metadata_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/era5_75var/metadata/data.json'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Environment details
docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --runtime nvidia --rm -it nvcr.io/nvidia/modulus/modulus:xx.xx bash
pip install mlflow
May I ask if you have found this file?
Hi, I've downloaded the dataset using the modulus/examples/weather/dataset_download that has a metadata.json in it, right now I am trying to run the code with that one, but I still don't have a working test so I don't know if that's the right file.
Also, I'm not sharing the file because I'm unsure about the copyright terms of the dataset.
Hi, I've downloaded the dataset using the
modulus/examples/weather/dataset_downloadthat has ametadata.jsonin it, right now I am trying to run the code with that one, but I still don't have a working test so I don't know if that's the right file.Also, I'm not sharing the file because I'm unsure about the copyright terms of the dataset.
ok!Thank u!!
hmm, no this file doesn't work for me :(
You should be able to run the training with synthetic data as of this commit. For running the training with the full dataset, please run the download_dataset example and open a new issue with more details if that example fails to produce the proper metadata file.