xMIP Using cf_xarray to infer attributes

What was done

This PR adds the .cf.guess_coord_axis() and .cf.add_canonical_attributes() functions to the preprocessing component of the library. This will update the dataset with additional attributes in line with CF conventions that can be inferred by the cf_xarray library.

See: Issue #185

How it was done

Two lines were added to the rename_cmip6 function:

    ds = ds.cf.guess_coord_axis()
    ds = ds.cf.add_canonical_attributes()

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

Testing

I added an additional assertion to the test_rename_cmip6 function in test_preprocessing.py. The test is weak mostly because there isn't a lot that is guaranteed to be true after running the additional steps here. The one check I added (because I believe it should always be true) is that history is present in the global attributes. I could certainly be convinced that this in unnecessary and the test should be removed.

WIP

This is a Work In Progress because I actually can't get the test to pass (😱). When I use my local version of the library it works just fine so I'm not sure why the tests are working. I thought I'd still post this, though, since the necessity of the test is questionable, before spending oodles of time debugging it.

Seeing the test problem

Personally I ran my tests with python -m pytest . in my conda activated environment.
This PR has a slightly annoying number of print statements, which I left in to help demonstrate how it seems that the preprocessing file is correctly using the new lines of code but (at least for me) the tests aren't. I print the dataset keys in both the preprocessing file and the test file to show the error.

Code I use to manually test, since the regular test is failing

import numpy as np
import xarray as xr

from cmip6_preprocessing.preprocessing import rename_cmip6, cmip6_renaming_dict

def create_test_ds(xname, yname, zname, xlen, ylen, zlen):
    x = np.linspace(0, 359, xlen)
    y = np.linspace(-90, 89, ylen)
    z = np.linspace(0, 5000, zlen)

    data = np.random.rand(len(x), len(y), len(z))
    ds = xr.DataArray(data, coords=[(xname, x), (yname, y), (zname, z)]).to_dataset(name="test")
    ds.attrs["source_id"] = "test_id"
    # if x and y are not lon and lat, add lon and lat to make sure there are no conflicts
    lon = ds[xname] * xr.ones_like(ds[yname])
    lat = xr.ones_like(ds[xname]) * ds[yname]
    if xname != "lon" and yname != "lat":
        ds = ds.assign_coords(lon=lon, lat=lat)
    else:
        ds = ds.assign_coords(longitude=lon, latitude=lat)
        return ds


ds = create_test_ds('i', 'j', 'lev', 10, 5, 6)
ds_renamed = rename_cmip6(ds, cmip6_renaming_dict())
# See 'history' in global attributes of `ds_renamed` but not in `ds`

Aug 05 '21 18:08 rwegener2

Thank you so much for adding this @rwegener2.

I think we need to clearly define what we want this implementation to do. Under which circumstances will a history attribute be generated or not? Maybe @dcherian can help there.

I think one major thing to check is if the axes are correctly detected. We 'know' the association between axes and dimensions roughly: x->X, y->Y, lev->Z. Can we write a test that checks something like this (pseudo-code):

def test_cf_axes(ds):
    ds_preprocessed = combined_preprocessind(ds)
    for dim, axis in [('x', 'X'), ('y', 'Y'), ('lev','Z'), ('time', 'T')]:
        if dim in ds_preprocessed.dims:
             #check that the axis is in the cf object
             axis in ds_preprocessed.cf.axes.keys()

You can create a bunch of test cases in the normal tests under https://github.com/jbusecke/cmip6_preprocessing/blob/master/tests/test_preprocessing.py, but I think this would be REALLY useful to check in the cloud tests , because that would enable us to actually see which models fail and why.

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

I think I would prefer a seperate function like cf_parsing or similar, just to keep things neat.

Aug 06 '21 13:08 jbusecke

Nice work @rwegener2 .

The history attribute is only added when .cf.add_canonical_attributes adds something.

I think you'll want to write a test like @jbusecke suggested. These are the attributes that .cf.guess_coord_axis adds: https://github.com/xarray-contrib/cf-xarray/blob/e628079a7ce25c40a3d9fff974615c84ca1bd82d/cf_xarray/accessor.py#L58-L67

Aug 06 '21 14:08 dcherian

Thanks for the feedback on this. I apologize for taking so insanely long to respond - I wasn't working on any development in August - but I am settled back in so I can be much more responsive going forward.

I updated the test_preprocessing.py test to use the axis -> dimension mapping rather than checking the history. I also cleaned up the code in preprocessing.py to use a separate _parse_cf() function.

As for the cloud test, I added the same chunk of testing code into the test_check_dim_coord_values_wo_intake function. This seems to be the right place, but I also didn't see anywhere in that testing script where the rename_cmip6() function gets run (which is where the _parse_cf() function gets called). @jbusecke let me know if this wasn't the right way to do that.

I'm sorry again for the long delay and let me know if there is anything else that should be changed.

Sep 09 '21 15:09 rwegener2

.cf.add_canonical_attributes

I think you want to avoid this. It adds units which could be totally wrong. You can skip that with skip="units" but I don't think it adds anything useful if you skip the units

Sep 09 '21 17:09 dcherian

Thanks for the feedback @dcherian. I just removed that so that only .cf.guess_coord_axis() is used.

Sep 10 '21 17:09 rwegener2

Hey everyone,

I am very sorry for dropping the ball here. I should have a bit more capacity to maintain cmip6_pp going forward. Is this still of interest? Would be happy to help to move this along.

Mar 17 '22 00:03 jbusecke

Hey @jbusecke!

If you think this is a helpful addition to the project I'm happy to put some energy to get it merged. I just pushed a superficial change to trigger the checks again because the previous failure logs got deleted with inactivity. I don't remember seeing much in the logs that made sense to me, but if you know what I should do to get the test to pass just let know. Thanks, Julius!

Mar 21 '22 14:03 rwegener2

I think I will dedicate some time to cmip6_pp in 1-2 weeks. How about I ping this issue then and we can iterate on it? I still believe this would be a great addition.

Mar 21 '22 15:03 jbusecke

Sounds good, I'll keep an eye out!

Mar 22 '22 12:03 rwegener2

xMIP xMIP copied to clipboard

Using cf_xarray to infer attributes

What was done

How it was done

Testing

WIP

Seeing the test problem

xMIP
xMIP copied to clipboard