xMIP
xMIP copied to clipboard
Using cf_xarray to infer attributes
What was done
This PR adds the .cf.guess_coord_axis() and .cf.add_canonical_attributes() functions to the preprocessing component of the library. This will update the dataset with additional attributes in line with CF conventions that can be inferred by the cf_xarray library.
See: Issue #185
How it was done
Two lines were added to the rename_cmip6 function:
ds = ds.cf.guess_coord_axis()
ds = ds.cf.add_canonical_attributes()
I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.
Testing
I added an additional assertion to the test_rename_cmip6 function in test_preprocessing.py. The test is weak mostly because there isn't a lot that is guaranteed to be true after running the additional steps here. The one check I added (because I believe it should always be true) is that history is present in the global attributes. I could certainly be convinced that this in unnecessary and the test should be removed.
WIP
This is a Work In Progress because I actually can't get the test to pass (😱). When I use my local version of the library it works just fine so I'm not sure why the tests are working. I thought I'd still post this, though, since the necessity of the test is questionable, before spending oodles of time debugging it.
Seeing the test problem
Personally I ran my tests with python -m pytest . in my conda activated environment.
This PR has a slightly annoying number of print statements, which I left in to help demonstrate how it seems that the preprocessing file is correctly using the new lines of code but (at least for me) the tests aren't. I print the dataset keys in both the preprocessing file and the test file to show the error.
Code I use to manually test, since the regular test is failing
import numpy as np
import xarray as xr
from cmip6_preprocessing.preprocessing import rename_cmip6, cmip6_renaming_dict
def create_test_ds(xname, yname, zname, xlen, ylen, zlen):
x = np.linspace(0, 359, xlen)
y = np.linspace(-90, 89, ylen)
z = np.linspace(0, 5000, zlen)
data = np.random.rand(len(x), len(y), len(z))
ds = xr.DataArray(data, coords=[(xname, x), (yname, y), (zname, z)]).to_dataset(name="test")
ds.attrs["source_id"] = "test_id"
# if x and y are not lon and lat, add lon and lat to make sure there are no conflicts
lon = ds[xname] * xr.ones_like(ds[yname])
lat = xr.ones_like(ds[xname]) * ds[yname]
if xname != "lon" and yname != "lat":
ds = ds.assign_coords(lon=lon, lat=lat)
else:
ds = ds.assign_coords(longitude=lon, latitude=lat)
return ds
ds = create_test_ds('i', 'j', 'lev', 10, 5, 6)
ds_renamed = rename_cmip6(ds, cmip6_renaming_dict())
# See 'history' in global attributes of `ds_renamed` but not in `ds`
Thank you so much for adding this @rwegener2.
I think we need to clearly define what we want this implementation to do. Under which circumstances will a history attribute be generated or not? Maybe @dcherian can help there.
I think one major thing to check is if the axes are correctly detected. We 'know' the association between axes and dimensions roughly: x->X, y->Y, lev->Z. Can we write a test that checks something like this (pseudo-code):
def test_cf_axes(ds):
ds_preprocessed = combined_preprocessind(ds)
for dim, axis in [('x', 'X'), ('y', 'Y'), ('lev','Z'), ('time', 'T')]:
if dim in ds_preprocessed.dims:
#check that the axis is in the cf object
axis in ds_preprocessed.cf.axes.keys()
You can create a bunch of test cases in the normal tests under https://github.com/jbusecke/cmip6_preprocessing/blob/master/tests/test_preprocessing.py, but I think this would be REALLY useful to check in the cloud tests , because that would enable us to actually see which models fail and why.
I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.
I think I would prefer a seperate function like cf_parsing or similar, just to keep things neat.
Nice work @rwegener2 .
The history attribute is only added when .cf.add_canonical_attributes adds something.
I think you'll want to write a test like @jbusecke suggested. These are the attributes that .cf.guess_coord_axis adds:
https://github.com/xarray-contrib/cf-xarray/blob/e628079a7ce25c40a3d9fff974615c84ca1bd82d/cf_xarray/accessor.py#L58-L67
Thanks for the feedback on this. I apologize for taking so insanely long to respond - I wasn't working on any development in August - but I am settled back in so I can be much more responsive going forward.
I updated the test_preprocessing.py test to use the axis -> dimension mapping rather than checking the history. I also cleaned up the code in preprocessing.py to use a separate _parse_cf() function.
As for the cloud test, I added the same chunk of testing code into the test_check_dim_coord_values_wo_intake function. This seems to be the right place, but I also didn't see anywhere in that testing script where the rename_cmip6() function gets run (which is where the _parse_cf() function gets called). @jbusecke let me know if this wasn't the right way to do that.
I'm sorry again for the long delay and let me know if there is anything else that should be changed.
.cf.add_canonical_attributes
I think you want to avoid this. It adds units which could be totally wrong. You can skip that with skip="units" but I don't think it adds anything useful if you skip the units
Thanks for the feedback @dcherian. I just removed that so that only .cf.guess_coord_axis() is used.
Hey everyone,
I am very sorry for dropping the ball here. I should have a bit more capacity to maintain cmip6_pp going forward. Is this still of interest? Would be happy to help to move this along.
Hey @jbusecke!
If you think this is a helpful addition to the project I'm happy to put some energy to get it merged. I just pushed a superficial change to trigger the checks again because the previous failure logs got deleted with inactivity. I don't remember seeing much in the logs that made sense to me, but if you know what I should do to get the test to pass just let know. Thanks, Julius!
I think I will dedicate some time to cmip6_pp in 1-2 weeks. How about I ping this issue then and we can iterate on it? I still believe this would be a great addition.
Sounds good, I'll keep an eye out!