xMIP
xMIP copied to clipboard
combine_staggered_grid error with CESM2
I'm working on cheyenne with a minimal set of CMIP6-OMIP output and trying to combine thetao and uo on a staggered grid. I get an error with only the CESM2 output that I think might be traceable to the staggered_grid_config.yaml.
Here's what I'm doing:
catalog_file = '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json'
col = intake.open_esm_datastore(catalog_file)
ds_cesm2 = {}
variable_id = ['thetao','uo']
for vv in variable_id:
cat = col.search(experiment_id = ['omip1'],
variable_id = [vv],
table_id = 'Omon',
grid_label = ['gn'],
source_id = 'CESM2',
member_id = 'r1i1p1f1',
version = 'v20190802')
ds_cesm2[vv] = cat.to_dataset_dict( preprocess=combined_preprocessing,
aggregate=True)
key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
other_ds=[ds_cesm2['uo'][key]])
and I get this error:
RuntimeError Traceback (most recent call last)
Input In [9], in <cell line: 3>()
1 key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
----> 3 grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
4 other_ds=[ds_cesm2['uo'][key]])
File ~/miniconda3/envs/xmoc_test/lib/python3.10/site-packages/cmip6_preprocessing/grids.py:465, in combine_staggered_grid(ds_base, other_ds, recalculate_metrics, grid_dict, **kwargs)
463 additional_dims = [di for di in ds_new.dims if di not in ds_g.dims]
464 if len(additional_dims) > 0:
--> 465 raise RuntimeError(
466 f"While trying to parse `{ds_new.name}`, detected dims that are not in the base dataset:[{additional_dims}]"
467 )
468 ds_g[ds_new.name] = ds_new
470 # Restore dims attrs from the beginning
RuntimeError: While trying to parse `uo`, detected dims that are not in the base dataset:[['x_right']]
However, if I manually set the grid_dict, everything works:
cesm2_gdict = {'CESM2':{'gn':{'axis_shift':{'X':'right', 'Y':'right'}}}}
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
other_ds=[ds_cesm2['uo'][key]], grid_dict = cesm2_gdict)
The grid_dict for native-grid CESM2 in the yaml is different for the X axis-shift:
CESM2:
gn:
axis_shift:
X: left
Y: right
I'm still wrapping my head around xgcm, so I'm not sure if this workaround is getting me to a right answer for the wrong reason or if the staggered_grid_config.yaml needs updating. I thought I'd raise this issue here in case it is something that needs fixing. Thanks for your help!
Hi @emaroon I would say the likely culprit is that the yaml file is plain wrong. I hacked that together using the lon/lat information from the velocities published on ESGF. I have now learned that this is very error prone actually (@jdldeauna had similar experiences a while back, but I cant find that issue right now). I think there are several ways forward, all of which should rely on the native grid information as much as possible (instead of using the lon/lat of the published datasets).
- The most immediate fix would be that users like you submit corrections to the existing yaml after properly ensuring that the information in the existing yaml is wrong.
- The more longterm but my preferred method would move all of that information over to https://github.com/pangeo-forge/CMIP6_static_grids-feedstock. I would want to encode the grid position information in the metadata of the native grids themselves.
Doing 1 would however be very helpful for 2, since we need to parse that information somehow manually for each model 😜
Also happy to have a quick call about these things, since I realize I havent really made up my mind on most of the details and that is probably quite confusing here 😜
I'm still getting comfortable using github, so submitting a fix for the CESM-grids in the yaml is a good exercise for me. Give me a little while, but I can put in a pull request for #1.
I double-checked the CESM2 via CMIP grid: x_u is to the right of x_t. Since all of the CESM contributions to CMIP likely used the same POP-grid (there's only one 1-degree supported POP configuration that I'm aware of), I'll update all of the CESM grids.
Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.
This bit of code originally from Ryan should be useful for adding the attributes: https://github.com/NCAR/pop-tools/blob/f8dc20b9962e0fdc7b9a1995503e368ff326e3e7/pop_tools/xgcm_util.py#L35-L44
Yes! It would be nice if all model codes had code like that available haha.
I'll update all of the CESM grids.
This would be a great quick PR, that would patch this issue for now, and until we have a better solution.
Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.
I think the best is to try to schedule a meeting on my calendly. Some time next week would be great.
Sounds great - scheduled a meeting next week on your calendly and will have that PR done by then. Thanks again!