modulus icon indicating copy to clipboard operation
modulus copied to clipboard

🐛[BUG]: GraphCast example with syntetic dataset requires ERA5 metadata

Open negedng opened this issue 6 months ago • 4 comments

Version

0.7.0

On which installation method(s) does this occur?

Docker

Describe the issue

Hi,

I started a new project with the GraphCast example. I wanted to test it on the synthetic dataset before downloading the ERA5 data, but it turned out that the loss.py requires missing metadata from ERA5.

I installed Modulus from docker (24.07), installed missing mlflow with pip, changed the num_samples_per_year_train to 1 to fit into the memory, and started training with the synthetic dataset python train_graphcast.py synthetic_dataset=true.

It asks for metadata from the ERA5 dataset.

Minimum reproducible example

python train_graphcast.py synthetic_dataset=true


### Relevant log output

```shell
root@be1b9fafbee5:/data/codes/modulus/examples/weather/graphcast# python train_graphcast.py synthetic_dataset=true
/usr/local/lib/python3.10/dist-packages/modulus/distributed/manager.py:346: UserWarning: Could not initialize using ENV, SLURM or OPENMPI methods. Assuming this is a single process job
  warn(
[11:10:46 - main - INFO] Rank: 0, Device: cuda:0
[11:10:46 - main - WARNING] Using Dummy dataset. Ignoring static dataset, cosine zenith angle,                                time of the year, and history. Also setting num_workers to 0.
[11:10:47 - main - INFO] Using torch.bfloat16 dtype
[11:10:47 - main - WARNING] Static dataset path is not provided. Setting num_channels_static to 0.
[11:10:57 - main - INFO] Model parameter count is 35296329
Generated synthetic temperature data in 4.07 seconds.
[11:11:02 - main - INFO] Loaded training datapipe of size 0
Error executing job with overrides: ['synthetic_dataset=true']
Traceback (most recent call last):
  File "/data/codes/modulus/examples/weather/graphcast/train_graphcast.py", line 349, in main
    trainer = GraphCastTrainer(cfg, dist, rank_zero_logger)
  File "/data/codes/modulus/examples/weather/graphcast/train_graphcast.py", line 211, in __init__
    self.criterion = GraphCastLossFunction(
  File "/usr/local/lib/python3.10/dist-packages/modulus/utils/graphcast/loss.py", line 129, in __init__
    self.channel_dict = self.get_channel_dict(dataset_metadata_path, channels_list)
  File "/usr/local/lib/python3.10/dist-packages/modulus/utils/graphcast/loss.py", line 173, in get_channel_dict
    with open(dataset_metadata_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/era5_75var/metadata/data.json'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment details

docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --runtime nvidia --rm -it nvcr.io/nvidia/modulus/modulus:xx.xx bash

pip install mlflow

negedng avatar Aug 06 '24 11:08 negedng