iris
                                
                                 iris copied to clipboard
                                
                                    iris copied to clipboard
                            
                            
                            
                        Unify regridders
📰 Custom Issue
There are a collection of regridders in iris which each work slightly differently, but are similar enough that they could potentially benefit from unifying some of their behaviour and perhaps have them inherit from the same abstract class. This would also benefit regridders built outside of iris such as in iris-esmf-regrid. The current set of iris regridding schemes is:
- Linear
- Nearest
- AreaWeighted
- UnstructuredNearest
- PointInCell
There is also the deprecated regridder in iris.experimental.regrid_conservative which is functionally replaced by iris-esmf-regrid.
The current set of regridders have the following quirks:
Linear and Nearest
- The LinearandNearestregridding schemes create aRectilinearRegridder(from analysis._regrid). This uses_RegularGridInterpolator(from analysis._scipy_interpolate) to calculate weights and then use those weights, this derives from scipy and is under a scipy copyright, though the code has developed significantly since it was copied from scipy.
- Source and target grid must be lat/lon DimCoords.
- Weights are not cached on RectilinearRegridder, though_RegularGridInterpolatordoes offer weight caching.
- Weights matrix is a sparse csr_matrix for Linear, for Nearest, weights take the form of an index rather than a matrix.
- Loops through 2D slices for calculations (rather than using transposition and reshaping like in iris-esmf-regrid).
- Has interpolatormethod.
- Has powerful _create_cubemethod which handles derived coordinates. Even handles derived coordinates covering the grid dimension via the use of a regridder function callback.
- Lazy regridding is supported via map_complete_blocks.
- Additional keywords: extrapolation_mode
AreaWeighted
- Creates AreaWeightedRegridder(from analysis._area_weighted).
- Source and target grid must be lat/lon DimCoords.
- Weights are cached on __init__.
- Uses python for loop for calculations rather than a sparse matrix.
- Uses RectilinearRegridder._create_cubewith the callback being equivalent to Linear. This means that a different regridding method is acting on the AuxCoords and the data.
- The regridder function equivalent regrid_area_weighted_rectilinear_src_and_gridhandles scalar grid coords, though the regridder class itself does not.
- Requires special handling of masked data in calculations.
- Lazy regridding is supported via map_complete_blocks.
- Additional keywords: mdtol
UnstructuredNearest
- Creates UnstructuredNearestNeighbourRegridder(from analysis.trajectory) which usesanalysis.trajectory.interpolatefor calculations.
- Target grid must be lat/lon DimCoords, Source grid must be lat/lon AuxCoords of any dimensionality which map over the same cube dimensions.
- Weights are not cached.
- In the current calculation, "weights" are fancy indices.
- Mask handling is not currently proper (see #4463), but could be made similar to Nearest.
- Cube creation handled mostly by trajectory.interpolaterather thanRectilinearRegridder._create_cube. This handles derived coords but not those which cover the grid dimensions, so a callback is not used.
- Lazy regridding not supported.
PointInCell
- Creates CurvilinearRegridder(from analysis._regrid)
- Target grid must be lat/lon DimCoords, Source grid must be 2D lat/lon AuxCoords which map over the same cube dimensions.
- Weights aren't cached on __init__but are cached on__call__.
- Weights matrix is a sparse csc_matrix (rather than the csr_matrix used in Linear).
- Loops through 2D slices for calculations (rather than using transposition and reshaping like in iris-esmf-regrid).
- Cube creation happens during perform inside loop over 2D slices. The resulting cube is a result of merging. Derived coordinates are not handled.
- Unlike other regridders, caching happens within the __call__function.
- Lazy regridding not supported.
- Additional keywords: weights
Suggestions
Currently, the most sophisticated regridder for metadata handling  is RectilinearRegridder. The problem seems to be that other regridders have slightly different use cases so they can't fully take advantage of the infrastructure in RectilinearRegridder. These regridders could be improved by having them derive from a regridder class which is more generic than RectilinearRegridder. This could also help the creation of future regridders (like in iris-esmf-regrid). Some ideas for how this regridder might be structured:
_AbstractRegridder
- Is an abstract class.
- Source/target grid Coords can be 1D lat/lon DimCoords over two separate dimeansions or lat/lon AuxCoords of any dimensionality (likely just 1D or 2D) over common dimensions.
- Additional kwargs are stored and passed into relevent methods.
- Methods for extracting (and checking validity of) grid Coords implemented for DimCoordsandAuxCoordswith a view that this could be extended forMeshin a subclass.
- Store source and target grid.
- Cache weights on __init__.
- Methods for checking compatibility of cubes in call and fetching grid dimensions are implemented for DimCoordsandAuxCoordswith a view that this could be extended forMeshin a subclass.
- Methods for calculating and applying weights are probably not implemented here, though the application of weights is likely to be common for some regridders so it may be appropriate to have helper functions for common calculations.
- More generic version of RectilinearRegridder._create_cube.
- The new _create_cubemethod should call on methods for adding the grid Coords with a view that this could be extended forMeshin a subclass.
Rough proposed common structure of regridders:
The following is copied from a comment in #4807 where it demonstrates a structure to aim for which would allow the possibility to derive from a common class. The aim here is to bring all regridder closer to this structure, improving their functionality along the way, and then refactor this structure in terms of class inheritance when that becomes possible.
def regrid(data, dims, regrid_info):
  ...
  return new_data
def create_cube(data, src, src_dims, tgt_coords, callback):
  ...
  return result_cube
def _prepare(src, tgt):
  ...
  return regrid_info
def _perform(cube, regrid_info):
  ...
  dims = get_dims(cube)
  data = regrid(cube.data, dims, regrid_info)
  callback = functools.partial(regrid, regrid_info=regrid_info)
  return create_cube(data, src, dims, tgt_coords, callback)
class Regridder:
  def __init__(src, tgt):
    ...
    self._regrid_info = _prepare(src, tgt)
  def __call__(src):
    ...
    return _perform(src, self._regrid_info)
Sub-tasks/Side-tasks
Aside from the larger task of refactoring all the regridders to derive from an abstract class, there are several smaller tasks which could help to unify the behaviour of regridders. Bear in mind some of these tasks may have redundancy with the larger refactoring task.
- [x] Rewrite AreaWeightedRegridderto use sparse matrices (similar to iris-esmf-regrid, assuming this improves performance). #5365
- [ ] Decide between csc_matrixandcsr_matrixfor consistency, comparing performance.
- [x] Rewrite the calculations in CurvilinearRegridderandRectilinearRegridderto work on whole cubes rather than slices. #4807
- [x] Fix UnstructuredNearestmask handling. #5062
- [x] Rewrite the way callbacks work in _create_cubeso that they can be consistent with the regridder they are being used in. #4807
- [ ] Make calculation lazy where possible. This may have to be done via a different method for nearest neighbour regridders which seem to work by indexing, so there may be a more appropriate method than map_complete_blocks.
- [ ] Make the handling of scalar coordinates consistent.
An example of _create_cube being written in a generic way that supports multiple different dimensionalities can be found in iris-esmf-regrid. Especially in this PR, also concerned with unifying regridders https://github.com/SciTools-incubator/iris-esmf-regrid/pull/175.
Thanks @stephenworsley for raising this issue. There would be great benefit in having a uniform API for regridding and interpolation. My take on this:
- Let's distinguish between 3 types of operations: projection, interpolation and regridding. a. Projection is the computation of line integrals, fluxes and volume integrals. The target is a point, line, area or volume, depending on the field staggering. b. Interpolation is the projection of the field onto a point. c regridding is projection applied to grid elements (nodes, edges, faces and cell volumes)
- The mesh. It can be unstructured, structured or rectilinear. The distinction is important because projection, interpolation and regridding require one to quickly find the cell that contains a point. This search is very fast for an uniform grid, but can become slow for a general structured grid and even slower for an unstructured grid. While every structured grid can be represented as an unstructured grid and every structured grid as an unstructured grid, there is a clear advantage in using the method most suited for a grid.
- The field staggering (nodal, edge, face or cell). Each staggering requires its own projection/interpolation/regridding method.
- The dimensionality of space, 3d, 2d, 1d, 2d + vertical axis, ....
From this perspective we can classify each of the existing projection/interpolation and regridding methods. Eg, UnstructuredNearest for the interpolation of a nodal field on an unstructured grid, RectilinearRegridder for regridding of a nodal field, .... Ideally we want all the methods implemented for 2d/3d, different staggerings and mesh types with dispatching the operation to the "right" method based on the grid/field metadata.