xMIP combine_staggered_grid error with CESM2

I'm working on cheyenne with a minimal set of CMIP6-OMIP output and trying to combine thetao and uo on a staggered grid. I get an error with only the CESM2 output that I think might be traceable to the staggered_grid_config.yaml.

Here's what I'm doing:

catalog_file = '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json'
col = intake.open_esm_datastore(catalog_file)

ds_cesm2 = {}

variable_id = ['thetao','uo']

for vv in variable_id:
    cat = col.search(experiment_id = ['omip1'],
            variable_id = [vv], 
            table_id = 'Omon',
            grid_label = ['gn'],
            source_id = 'CESM2',
            member_id = 'r1i1p1f1', 
            version = 'v20190802')

    ds_cesm2[vv] = cat.to_dataset_dict( preprocess=combined_preprocessing,
        aggregate=True)

key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
                 other_ds=[ds_cesm2['uo'][key]])

and I get this error:

RuntimeError                              Traceback (most recent call last)
Input In [9], in <cell line: 3>()
      1 key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
----> 3 grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
      4                  other_ds=[ds_cesm2['uo'][key]])

File ~/miniconda3/envs/xmoc_test/lib/python3.10/site-packages/cmip6_preprocessing/grids.py:465, in combine_staggered_grid(ds_base, other_ds, recalculate_metrics, grid_dict, **kwargs)
    463             additional_dims = [di for di in ds_new.dims if di not in ds_g.dims]
    464             if len(additional_dims) > 0:
--> 465                 raise RuntimeError(
    466                     f"While trying to parse `{ds_new.name}`, detected dims that are not in the base dataset:[{additional_dims}]"
    467                 )
    468             ds_g[ds_new.name] = ds_new
    470 # Restore dims attrs from the beginning

RuntimeError: While trying to parse `uo`, detected dims that are not in the base dataset:[['x_right']]

However, if I manually set the grid_dict, everything works:

cesm2_gdict = {'CESM2':{'gn':{'axis_shift':{'X':'right', 'Y':'right'}}}}
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
                 other_ds=[ds_cesm2['uo'][key]], grid_dict = cesm2_gdict)

The grid_dict for native-grid CESM2 in the yaml is different for the X axis-shift:

CESM2:
  gn:
    axis_shift:
      X: left
      Y: right

I'm still wrapping my head around xgcm, so I'm not sure if this workaround is getting me to a right answer for the wrong reason or if the staggered_grid_config.yaml needs updating. I thought I'd raise this issue here in case it is something that needs fixing. Thanks for your help!

Jun 27 '22 23:06 emaroon

Hi @emaroon I would say the likely culprit is that the yaml file is plain wrong. I hacked that together using the lon/lat information from the velocities published on ESGF. I have now learned that this is very error prone actually (@jdldeauna had similar experiences a while back, but I cant find that issue right now). I think there are several ways forward, all of which should rely on the native grid information as much as possible (instead of using the lon/lat of the published datasets).

The most immediate fix would be that users like you submit corrections to the existing yaml after properly ensuring that the information in the existing yaml is wrong.
The more longterm but my preferred method would move all of that information over to https://github.com/pangeo-forge/CMIP6_static_grids-feedstock. I would want to encode the grid position information in the metadata of the native grids themselves.

Doing 1 would however be very helpful for 2, since we need to parse that information somehow manually for each model 😜

Jun 28 '22 14:06 jbusecke

Also happy to have a quick call about these things, since I realize I havent really made up my mind on most of the details and that is probably quite confusing here 😜

Jun 28 '22 14:06 jbusecke

I'm still getting comfortable using github, so submitting a fix for the CESM-grids in the yaml is a good exercise for me. Give me a little while, but I can put in a pull request for #1.

I double-checked the CESM2 via CMIP grid: x_u is to the right of x_t. Since all of the CESM contributions to CMIP likely used the same POP-grid (there's only one 1-degree supported POP configuration that I'm aware of), I'll update all of the CESM grids.

Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.

Jun 28 '22 15:06 emaroon

This bit of code originally from Ryan should be useful for adding the attributes: https://github.com/NCAR/pop-tools/blob/f8dc20b9962e0fdc7b9a1995503e368ff326e3e7/pop_tools/xgcm_util.py#L35-L44

Jun 28 '22 15:06 dcherian

Yes! It would be nice if all model codes had code like that available haha.

Jun 28 '22 23:06 jbusecke

I'll update all of the CESM grids.

This would be a great quick PR, that would patch this issue for now, and until we have a better solution.

Jun 30 '22 19:06 jbusecke

Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.

I think the best is to try to schedule a meeting on my calendly. Some time next week would be great.

Jun 30 '22 19:06 jbusecke

Sounds great - scheduled a meeting next week on your calendly and will have that PR done by then. Thanks again!

Jul 01 '22 17:07 emaroon

xMIP xMIP copied to clipboard

combine_staggered_grid error with CESM2

xMIP
xMIP copied to clipboard