xMIP
xMIP copied to clipboard
`replace_x_y_nominal_lat_lon` does not work for > 360 `lon` coordinates
Thanks a lot for this great tool that comes in super handy for harmonizing CMIP data
The bug
When using data with more than 360 lon coordinates, the replace_x_y_nominal_lat_lon
will not work as a 360 periodicity is assumed in _interp_nominal_lon
. I noticed this behavior in EC-Earth-Consortium EC-Earth3 ssp585 r1i1p1f1 Amon tas gr v20200310
, which has this subdegree gridding (lon
being 512 datapoints long).
from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat
ds = xr.open_mfdataset( ... )
def recast(data_set):
ds = rename_cmip6(data_set)
ds = promote_empty_dims(ds)
ds = broadcast_lonlat(ds)
ds = replace_x_y_nominal_lat_lon(ds)
return ds
ds.isel(time=0)['tas'].plot()
plt.show()
recast(ds).isel(time=0)['tas'].plot()
plt.show()
Expected behavior
Don't suffle the data when interpolating with sub-degree longitude gridding. This can be solved by not assuming a 360 periodicity.
Underlying cause:
numpy.interp
takes the period argument, but this is in the x-coordinate
, however, in _interp_nominal_lon
, the period is set to 360, this causes funky behavior in this function when you feed it an array with >360 entries, as they are sorted by their arg index.
MWE:
Maybe a small example might be helpful here:
# Create some dummy data
import numpy as np
import xarray as xr
from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat
import matplotlib.pyplot as plt
lon=np.linspace(0,360, 513)[:-1]
lat=np.linspace(-90,90, 181)[:-1]
time=np.arange(1)
# Totally arbitrary data
data=np.arange(len(lat)*len(lon)*len(time)).reshape(len(time), len(lat), len(lon))*lon
# Add some NaN values just as an example
data[:, :, len(lon)//2+30: len(lon)//2+50] = np.nan
ds_dummy = xr.Dataset(
data_vars = dict(var=(('time','lat', 'lon'), data,)),
coords = dict(time=time, lon=lon, lat=lat,),
attrs=dict(source_id='bla'))
ds_dummy['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'))
# Now apply the recast using replace_x_y_nominal_lat_lon
def recast(data_set):
ds = rename_cmip6(data_set)
ds = promote_empty_dims(ds)
ds = broadcast_lonlat(ds)
ds = replace_x_y_nominal_lat_lon(ds)
return ds
ds_dum_recast = recast(ds_dummy)
replace_x_y_nominal_lat_lon(ds_dum_recast)['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'),)
This second imagine is clearly not what we'd hoped for.
Versions
import sys
import numpy, xarray, xmip
message = f'{sys.platform}\n{sys.version}'
for b in 'numpy xarray xmip'.split():
message += f'\n\t{b}=={locals().get(b).__version__}'
print(message)
linux
3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0]
numpy==1.24.3
xarray==2023.1.0
xmip==0.7.1
Related PRs/Issues
I believe https://github.com/jbusecke/xMIP/issues/93 is also related to this and https://github.com/jbusecke/xMIP/pull/79 might be where the bug was introduced.
One way this might be solved is by replacing the above mentioned line as is done in this PR: https://github.com/jbusecke/xMIP/pull/296
I am also encountering some of the issues described here (and similar once related to coordinates). For some models the longitude fixing seems to screw the data, as also experienced by @Quentin-Bourgeois. Is a fix for this issue in the pipeline @jbusecke?
Hi @JoranAngevaare and @bguillod. Very sorry for the long radio silence here. Trying to catch up with a lot of maintenance after working on the new Pangeo/ESGF CMIP6 Zarr data ingestion.
First of all, thanks a lot for the carefully crafted report here @JoranAngevaare. And even more for putting in a fix for this. Ill reopen the PR and continue discussion over there?