iris Unify regridders

📰 Custom Issue

There are a collection of regridders in iris which each work slightly differently, but are similar enough that they could potentially benefit from unifying some of their behaviour and perhaps have them inherit from the same abstract class. This would also benefit regridders built outside of iris such as in iris-esmf-regrid. The current set of iris regridding schemes is:

Linear
Nearest
AreaWeighted
UnstructuredNearest
PointInCell

There is also the deprecated regridder in iris.experimental.regrid_conservative which is functionally replaced by iris-esmf-regrid.

The current set of regridders have the following quirks:

Linear and Nearest

The Linear and Nearest regridding schemes create a RectilinearRegridder (from analysis._regrid). This uses _RegularGridInterpolator (from analysis._scipy_interpolate) to calculate weights and then use those weights, this derives from scipy and is under a scipy copyright, though the code has developed significantly since it was copied from scipy.
Source and target grid must be lat/lon DimCoords.
Weights are not cached on RectilinearRegridder, though _RegularGridInterpolator does offer weight caching.
Weights matrix is a sparse csr_matrix for Linear, for Nearest, weights take the form of an index rather than a matrix.
Loops through 2D slices for calculations (rather than using transposition and reshaping like in iris-esmf-regrid).
Has interpolator method.
Has powerful _create_cube method which handles derived coordinates. Even handles derived coordinates covering the grid dimension via the use of a regridder function callback.
Lazy regridding is supported via map_complete_blocks.
Additional keywords: extrapolation_mode

AreaWeighted

Creates AreaWeightedRegridder (from analysis._area_weighted).
Source and target grid must be lat/lon DimCoords.
Weights are cached on __init__.
Uses python for loop for calculations rather than a sparse matrix.
Uses RectilinearRegridder._create_cube with the callback being equivalent to Linear. This means that a different regridding method is acting on the AuxCoords and the data.
The regridder function equivalent regrid_area_weighted_rectilinear_src_and_grid handles scalar grid coords, though the regridder class itself does not.
Requires special handling of masked data in calculations.
Lazy regridding is supported via map_complete_blocks.
Additional keywords: mdtol

UnstructuredNearest

Creates UnstructuredNearestNeighbourRegridder (from analysis.trajectory) which uses analysis.trajectory.interpolate for calculations.
Target grid must be lat/lon DimCoords, Source grid must be lat/lon AuxCoords of any dimensionality which map over the same cube dimensions.
Weights are not cached.
In the current calculation, "weights" are fancy indices.
Mask handling is not currently proper (see #4463), but could be made similar to Nearest.
Cube creation handled mostly by trajectory.interpolate rather than RectilinearRegridder._create_cube. This handles derived coords but not those which cover the grid dimensions, so a callback is not used.
Lazy regridding not supported.

PointInCell

Creates CurvilinearRegridder (from analysis._regrid)
Target grid must be lat/lon DimCoords, Source grid must be 2D lat/lon AuxCoords which map over the same cube dimensions.
Weights aren't cached on __init__ but are cached on __call__.
Weights matrix is a sparse csc_matrix (rather than the csr_matrix used in Linear).
Loops through 2D slices for calculations (rather than using transposition and reshaping like in iris-esmf-regrid).
Cube creation happens during perform inside loop over 2D slices. The resulting cube is a result of merging. Derived coordinates are not handled.
Unlike other regridders, caching happens within the __call__ function.
Lazy regridding not supported.
Additional keywords: weights

Suggestions

Currently, the most sophisticated regridder for metadata handling is RectilinearRegridder. The problem seems to be that other regridders have slightly different use cases so they can't fully take advantage of the infrastructure in RectilinearRegridder. These regridders could be improved by having them derive from a regridder class which is more generic than RectilinearRegridder. This could also help the creation of future regridders (like in iris-esmf-regrid). Some ideas for how this regridder might be structured:

_AbstractRegridder

Is an abstract class.
Source/target grid Coords can be 1D lat/lon DimCoords over two separate dimeansions or lat/lon AuxCoords of any dimensionality (likely just 1D or 2D) over common dimensions.
Additional kwargs are stored and passed into relevent methods.
Methods for extracting (and checking validity of) grid Coords implemented for DimCoords and AuxCoords with a view that this could be extended for Mesh in a subclass.
Store source and target grid.
Cache weights on __init__.
Methods for checking compatibility of cubes in call and fetching grid dimensions are implemented for DimCoords and AuxCoords with a view that this could be extended for Mesh in a subclass.
Methods for calculating and applying weights are probably not implemented here, though the application of weights is likely to be common for some regridders so it may be appropriate to have helper functions for common calculations.
More generic version of RectilinearRegridder._create_cube.
The new _create_cube method should call on methods for adding the grid Coords with a view that this could be extended for Mesh in a subclass.

Rough proposed common structure of regridders:

The following is copied from a comment in #4807 where it demonstrates a structure to aim for which would allow the possibility to derive from a common class. The aim here is to bring all regridder closer to this structure, improving their functionality along the way, and then refactor this structure in terms of class inheritance when that becomes possible.


def regrid(data, dims, regrid_info):
  ...
  return new_data

def create_cube(data, src, src_dims, tgt_coords, callback):
  ...
  return result_cube

def _prepare(src, tgt):
  ...
  return regrid_info

def _perform(cube, regrid_info):
  ...
  dims = get_dims(cube)
  data = regrid(cube.data, dims, regrid_info)
  callback = functools.partial(regrid, regrid_info=regrid_info)
  return create_cube(data, src, dims, tgt_coords, callback)

class Regridder:
  def __init__(src, tgt):
    ...
    self._regrid_info = _prepare(src, tgt)
  def __call__(src):
    ...
    return _perform(src, self._regrid_info)

Sub-tasks/Side-tasks

Aside from the larger task of refactoring all the regridders to derive from an abstract class, there are several smaller tasks which could help to unify the behaviour of regridders. Bear in mind some of these tasks may have redundancy with the larger refactoring task.

[x] Rewrite AreaWeightedRegridder to use sparse matrices (similar to iris-esmf-regrid, assuming this improves performance). #5365
[ ] Decide between csc_matrix and csr_matrix for consistency, comparing performance.
[x] Rewrite the calculations in CurvilinearRegridder and RectilinearRegridder to work on whole cubes rather than slices. #4807
[x] Fix UnstructuredNearest mask handling. #5062
[x] Rewrite the way callbacks work in _create_cube so that they can be consistent with the regridder they are being used in. #4807
[ ] Make calculation lazy where possible. This may have to be done via a different method for nearest neighbour regridders which seem to work by indexing, so there may be a more appropriate method than map_complete_blocks.
[ ] Make the handling of scalar coordinates consistent.

May 19 '22 12:05 stephenworsley

An example of _create_cube being written in a generic way that supports multiple different dimensionalities can be found in iris-esmf-regrid. Especially in this PR, also concerned with unifying regridders https://github.com/SciTools-incubator/iris-esmf-regrid/pull/175.

May 20 '22 09:05 stephenworsley

Thanks @stephenworsley for raising this issue. There would be great benefit in having a uniform API for regridding and interpolation. My take on this:

Let's distinguish between 3 types of operations: projection, interpolation and regridding. a. Projection is the computation of line integrals, fluxes and volume integrals. The target is a point, line, area or volume, depending on the field staggering. b. Interpolation is the projection of the field onto a point. c regridding is projection applied to grid elements (nodes, edges, faces and cell volumes)
The mesh. It can be unstructured, structured or rectilinear. The distinction is important because projection, interpolation and regridding require one to quickly find the cell that contains a point. This search is very fast for an uniform grid, but can become slow for a general structured grid and even slower for an unstructured grid. While every structured grid can be represented as an unstructured grid and every structured grid as an unstructured grid, there is a clear advantage in using the method most suited for a grid.
The field staggering (nodal, edge, face or cell). Each staggering requires its own projection/interpolation/regridding method.
The dimensionality of space, 3d, 2d, 1d, 2d + vertical axis, ....

From this perspective we can classify each of the existing projection/interpolation and regridding methods. Eg, UnstructuredNearest for the interpolation of a nodal field on an unstructured grid, RectilinearRegridder for regridding of a nodal field, .... Ideally we want all the methods implemented for 2d/3d, different staggerings and mesh types with dispatching the operation to the "right" method based on the grid/field metadata.

May 30 '22 22:05 pletzer

iris iris copied to clipboard

Unify regridders

📰 Custom Issue

Linear and Nearest

AreaWeighted

UnstructuredNearest

PointInCell

Suggestions

_AbstractRegridder

Rough proposed common structure of regridders:

Sub-tasks/Side-tasks

iris
iris copied to clipboard