xESMF
xESMF copied to clipboard
Halos - (Partially) duplicated columns and rows in the source grid
Description
Climate model data can be provided on grids that feature halo regions, which are partially or entirely duplicated columns or rows at the edges of the grid (eg. about for half of the ocean models in CMIP6, data is provided on grids that include a halo). Here is a quote from the NEMO ocean engine document:
Since bicubic interpolation requires the calculation of gradients at each point on the grid, the corresponding arrays are dimensioned with a halo of width one grid point all the way around. When the array of points from the data file is adjacent to an edge of the data grid, the halo is either a copy of the row/column next to it (non-cyclical case), or is a copy of one from the first few columns on the opposite side of the grid (cyclical case).
The cyclical case is sketched here:
For the conservative remapping method, the value of the duplicated cells will be added up to the value of the original cells.
It would be a solution to just cut off the halo rows/columns before generating the weights (if one knows its location in the grid). However in more complicated curvilinear grids it can occure, that rows and columns are only partially duplicated. It can also occure, that, while cell centers are duplicated properly, not necessarily also the bounds are duplicated 1:1, but kind of "sloppy", which leads to cells collapsing to lines or points, or cells having their cell centers outside their bounds. This leads to more problems when generating remapping weights with xESMF, that, apart from the incorrect conservative values, can often be dealt with by setting the ignore_degenerate
option to True
(but if the result is then correct, I don't know).
See for example this cdo verifygrid
output:
>>cdo verifygrid CMIP6/CMIP/CMCC/CMCC-CM2-HR4/historical/r1i1p1f1/Omon/tos/gn/v20200904/tos_Omon_CMCC-CM2-HR4_historical_r1i1p1f1_gn_197501-199912.nc:
cdo verifygrid: Grid consists of 1515542 (1442x1051) cells (type: curvilinear), of which
cdo verifygrid: 322 cells have 3 vertices
cdo verifygrid: 1513765 cells have 4 vertices
cdo verifygrid: 277 cells have duplicate vertices
cdo verifygrid: 1455 cells have unusable vertices
cdo verifygrid: 4496 cells are not unique
cdo verifygrid: 1674 cells are non-convex
cdo verifygrid: 134 cells have their vertices arranged in a clockwise order
cdo verifygrid: 5126 cells have their center point located outside their boundaries
cdo verifygrid: longitude : 9.094947e-13 to 360 degrees
cdo verifygrid: latitude : -78.79526 to 89.94787 degrees
In the following a notebook, showing the behaviour of xESMF when dealing with data from the two ocean models MPI-ESM1-2-LR/HR MPIOM, that both feature a halo: https://nbviewer.jupyter.org/github/roocs/regrid-prototype/blob/main/docs/notebooks/xESMF_Behaviour_Halo.ipynb
A first way to identify the unique columns of a grid
# 2D latitude and longitude arrays - create an array of (lat,lon) tuples
latlon_halo=np.array(list(zip(ds["latitude"].values.ravel(),ds["longitude"].values.ravel())),
dtype=('double,double')).reshape(ds["longitude"].values.shape)
# use numpy.unique to identify unique columns
latlon_no_halo,indices=np.unique(latlon_halo, axis=1, return_index=True)
If one does not know which grid cells are part of the halo, and if the halo cells / cell bounds are problematic, it is not enough to just select the unique grid cells, and one has to make a decision which of the cells are the duplicates and which the original ones.
Possible solutions to deal with a halo in xESMF, when using the conservative method, as seen in the Jupyter Notebook above.
For bilinear and patch methods, only the "weirder" properties of some of the duplicated cells or bounds seem to cause problems, which mostly can be resolved using the ignore_degenerate
option.
For the conservative method, when one has the knowledge where the halo is defined, I tested:
- Mask the duplicated cells
- Set the remapping weight matrix entries for the duplicated cells to 0 (before applying
add_nans_to_weights
) - Use the adaptive masking (skipna) method to re-normalize the too high values (which unexpectedly does not work too well)
Do you have any feedback regarding the methods I tested to deal with halos? Do you have any further suggestions how to deal with grid halos or how to identify the duplicated version(s) of a cell?
Do you think xESMF should issue a warning or throw an exception when encountering duplicated cells (or even overlapping cells) and the conservative method is used? The warning could be printed when skipna notices entries in the normalisation array (fraction_valid
) that are greater than 1 (as that should mean overlapping cells?!), or one uses for example above code to check if duplicate cell centers exist.
Do you think xESMF should feature a method to deal with halos, or to identify halos?
my experience with the ORCA family of grids is that the first and last columns are duplicates so they can be safely removed and the regridding be done on the remaining data, e.g [:,1:-1]
I don't think xESMF should add extra-code to deal with halos. To my knowledge, NEMO is the only model outputing these extra columns.
The ocean model MPIOM from MPIESM has also a halo
Thanks for your reply @raphaeldussin . As @aaronspring notes, there are other models beside NEMO, that are outputting extra rows/columns.
I made a list of all CMIP6 ocean models I could find, showing which ones have duplicated cell centers or bounds, or collapsing cells, and quite a lot are affected: https://c6dreq.dkrz.de/files/ocean_grids.php
From my experience with CMIP6 ocean data, duplicated cells do mostly originate from a halo, but sometimes also from incorrect grid descriptions / incorrectly defined coordinate variables. There are cases with partially duplicated rows/columns, and cases where the bounds of a grid column are duplicates but not the cell centers, or vice versa.
Since such kind of grids and mistakes seem common (looking at CMIP6 data) and I doubt that too many users are aware of it, I still think it might be helpful to at least issue a warning when duplicated or collapsed cells are encountered.
Apart from that, do you agree that, if one cannot cut off the halo, providing xESMF with a mask of the duplicated / halo cells for the weights generation is the optimal way to deal with this (incl. the ignore_degenerate
option if the grid contains collapsing cells as well)?
@stefraynaud I thought the adaptive masking (skipna
) would be able to renormalize contributions of >1
as well as it does for contributions of <1
. Do you have an idea why it does not work too well for overlapping / duplicated cells (link from above: https://nbviewer.jupyter.org/github/roocs/regrid-prototype/blob/main/docs/notebooks/xESMF_Behaviour_Halo.ipynb)?