earth2mip icon indicating copy to clipboard operation
earth2mip copied to clipboard

🐛[BUG]: Unable to run inference_ensemble for models other than fcnv2_sm

Open david5010 opened this issue 1 year ago • 5 comments

Version

main

On which installation method(s) does this occur?

Source

Describe the issue

I've followed the example provided here. However, it seems to only work with fcnv2_sm. I tried to run DLWP, GraphCast, Pangu which caused different issues.

DLWP: RuntimeError: MKL FFT error: Intel MKL DFTI ERROR: Inconsistent configuration parameters occuring at ensemble_utils.py line 171

GraphCast: Missing metadata.json (unsure how to download the relevant package)

Pangu: This one might be working but I have issues with onnxruntime finding my GPU

Environment details

git clone
pip install .

david5010 avatar Jul 09 '24 14:07 david5010

Thanks for the report. It seems like there are a few issues here. I'd guess part of it is related to your installation since it seems like you are not using the gpu for either pangu or dlwp inference.

It also may be easier to start with one of the simpler examples e.g: https://nvidia.github.io/earth2mip/examples/02_model_comparison.html. I'm not sure we've used ensemble inference with DLWP.

I could probably help more if you provide full error messages.

nbren12 avatar Jul 09 '24 16:07 nbren12

For Pangu-Weather, I think the model is working although my installation might have some issues with onnxruntime. As for DLWP, here's the error message. It's different when I use GFS and IFS as datasource:

With IFS initial conditions image

Config: config = { "ensemble_members": args.members, "noise_amplitude": 0.05, "simulation_length": args.lead_time, "weather_event": { "properties": { "name": "Globe", "start_time": formatted_date, "initial_condition_source": 'ifs', }, "domains": [ { "name": "global", "type": "Window", "diagnostics": [ { "type": "raw", "channels": [ "t2m", "u10m", "v10m" ], } ], } ], }, # TODO: Format so that it goes into {YYYYMMDD}.t{HH-00,06,12,18}z/{init-ifs or init-gfs}/ "output_path": f"{args.output_dir}/{args.date.strftime('%Y%m%d.t%Hz')}/ecmwfdlwp", "output_frequency": 1, "weather_model": "dlwp", "seed": 12345, "use_cuda_graphs": False, "ensemble_batch_size": 1, "autocast_fp16": False, "perturbation_strategy": "correlated", "noise_reddening": 2.0 }

With GFS: image

Config: config = { "ensemble_members": args.members, "noise_amplitude": 0.05, "simulation_length": args.lead_time, "weather_event": { "properties": { "name": "Globe", "start_time": formatted_date, "initial_condition_source": 'gfs', }, "domains": [ { "name": "global", "type": "Window", "diagnostics": [ { "type": "raw", "channels": [ "t2m", "u10m", "v10m" ], } ], } ], }, # TODO: Format so that it goes into {YYYYMMDD}.t{HH-00,06,12,18}z/{init-ifs or init-gfs}/ "output_path": f"{args.output_dir}/{args.date.strftime('%Y%m%d.t%Hz')}/ecmwfdlwp", "output_frequency": 1, "weather_model": "dlwp", "seed": 12345, "use_cuda_graphs": False, "ensemble_batch_size": 1, "autocast_fp16": False, "perturbation_strategy": "correlated", "noise_reddening": 2.0 }

    As for graphcast_operational, it says that I'm missing the metadata.json. I'm unsure how to get that even after I ran pip install .[graphcast].

I hope this information help and please let me know what other error messages I can provide!

david5010 avatar Jul 16 '24 19:07 david5010

Hello I got the same error and I also tried to run with basic inference. When I import graphcast with import earth2mip.networks.graphcast as graphcast, it cannot find module named haiku for me.

ndp99VN avatar Jul 17 '24 18:07 ndp99VN

Hello I got the same error and I also tried to run with basic inference. When I import graphcast with import earth2mip.networks.graphcast as graphcast, it cannot find module named haiku for me.

For this, try to run pip install .[graphcast] at the root of the repo, and also do pip install -r requirements.txt.

Where you able to register the graphcast model? In .cach/, i only see fcnv2, pangu, dlwp, but not graphcast

david5010 avatar Jul 19 '24 16:07 david5010

May I have your email so that I can contact with you for details ? I'm trying to run also DLWP, pangu and graphcast model but it seems that I got different errors with different models.

ndp99VN avatar Jul 28 '24 23:07 ndp99VN

It's different when I use GFS and IFS as datasource:

For IFS it failed to find all/any the appropriate channels. Either because of a bug in the data source or because the IFS data itself doesn't have some channel.

For GFS, there was some issue with the intel MKL library. This seems like an installation/environment issue.

nbren12 avatar Oct 07 '24 15:10 nbren12

@ndp99VN

graphcast, it cannot find module named haiku for me.

pip install haiku

We also have an extra for graphcast

pip install path/to/earth2mip[graphcast]

nbren12 avatar Oct 07 '24 15:10 nbren12

Closing this issue since it is bit non-specific and seems mostly related to installation. Not sure we can provide a complete guide to installation of graphcast, pangu, etc. Since details will change depending on the users environment. The IFS data source is potentially broken though, so I opened a new issue for that.

nbren12 avatar Oct 07 '24 15:10 nbren12