parcels icon indicating copy to clipboard operation
parcels copied to clipboard

A clear API for handling grid axes, barycentric coordinates, index searching, and ravelling.

Open VeckoTheGecko opened this issue 8 months ago • 8 comments

Terminology[^1]:

  • Parcels Grid: The logical grid representation (via XGrid or UxGrid) used by Parcels to determine particle position. In structured grids, this corresponds to the grid of F-points (see #2037). Time is not part of the grid abstraction.
  • Axis (Axes): The spatial dimensions of the Parcels Grid. For structured grids: X, Y, Z; for unstructured: Z and FACE. These relate closely to the underlying data layout (e.g., in the structured case, only a small correction about left/right grid indexing is required to get to the underlying data representation). Indexing a particle involves locating it along each axis.
  • Index (Indices): The cell position of a particle along a given axis. Combined indices define the full grid cell a particle occupies.
  • Barycentric Coordinates: The coordinates defining a particle’s position within a grid cell. In structured grids, this is a single coordinate per axis (e.g., xsi); in unstructured, this can be a mix (e.g., a single coordinate for zeta, and n-coordinates for an n-gon that the particle resides in - 3 for triangular cells).
  • Encoded Indices (ei): A compact, single-integer encoding of a particle’s full grid indices.

Locating a particle’s cell and barycentric coordinates within a Parcels Grid is essential for interpolation. This functionality should reside in the Grid class, exposing a clear API that the Field class can use for interpolation—ensuring separation of concerns, so Field need not manage grid geometry.

In the structured case, Parcels must support grids with varying axis dimensionality:

  • 3D:
    • Options: 2D lat + 2D lon + 1D depth (curvilinear), or 1D lat + 1D lon + 1D depth (rectilinear)
    • Use case: 3D U/V fields
  • 2D:
    • Options: 2D lat + 2D lon, or 1D lat/lon + 1D depth
    • Use case: Wave flumes, windage
  • 1D: Single 1D axis (lat/lon/depth)
  • 0D: Constant value everywhere (constant field)

Data may exist on any subset of the grid axes (e.g., U may be on X, Y, Z; depth-averaged U on X, Y). During interpolation, the particle’s full position is used, but only relevant axis indices (e.g., xi, yi) are needed to access the data.

Proposed API

>>> x, y, z = ..., ..., ... # float float float
>>> grid = ...

.search(z, y, x)

>>> grid.search(z, y, x)
# returns:
# {
#   axis: (index, bcoords), ...
# }

# if grid is 3D structured
{"X": (xi, xsi), "Y": (yi, eta), "Z": (zi, zeta)}

# if grid is 2D structured (no depth)
{"X": (xi, xsi), "Y": (yi, eta)}

# if grid is 1D structured (depth)
{"Z": (zi, zeta)}

# if grid is unstructured
{"Z": (zi, zeta), "FACE": (fi, bcoords)}
# e.g., {"Z": (2, 0.5), "FACE": (200, np.array(0.2, 0.3, 0.5))}

ravel_index(axis_indices)

>>> axis_indices = {"X": xi, "Y": yi, "Z": zi} # 3d structured grid
>>> axis_indices = {"X": xi, "Y": yi} # 2d structured grid
>>> axis_indices = {"Z": zi} # 1d structured grid (depth)
>>> axis_indices = {"Z": zi, "FACE": fi} # unstructured grid
>>> grid.ravel_index(axis_indices)
# returns ei
Old API for reference

Old approach

# XGrid class
def unravel_index(self, ei):
    """
    Converts a single encoded index back into a Z, Y, and X indices.

    Parameters
    ----------
    ei : int
        Encoded index to be unraveled.

    Returns
    -------
    zi : int
        Vertical index.
    yi : int
        Y index.
    xi : int
        X index.
    """

def ravel_index(self, zi, yi, xi):
    ... # inverse of unravel_index

# UxGrid class
def unravel_index(self, ei):
    """
    Converts a single encoded index back into a vertical index and face index.

    Parameters
    ----------
    ei : int
        Encoded index to be unraveled.

    Returns
    -------
    zi : int
        Vertical index.
    fi : int
        Face index.
    """

def ravel_index(self, zi, fi):
    ... # inverse of unravel_index

# applies to both XGrid and UxGrid
def search(self, z: float, y: float, x: float, ei=None, search2D: bool = False):
    """
    Perform a spatial (and optionally vertical) search to locate the grid element
    that contains a given point (x, y, z).

    This method delegates to grid-type-specific logic (e.g., structured or unstructured)
    to determine the appropriate indices and interpolation coordinates for evaluating a field.

    Parameters
    ----------
    z : float
        Vertical coordinate of the query point. If `search2D=True`, this may be ignored.
    y : float
        Latitude or vertical index, depending on grid type and projection.
    x : float
        Longitude or horizontal index, depending on grid type and projection.
    ei : int, optional
        A previously computed encoded index (e.g., raveled face or cell index). If provided,
        the search will first attempt to validate and reuse it before falling back to
        a global or local search strategy.
    search2D : bool, default=False
        If True, perform only a 2D search (x, y), ignoring the vertical component z.

    Returns
    -------
    bcoords : np.ndarray or tuple
        Interpolation weights or barycentric coordinates within the containing cell/face.
        The interpretation of `bcoords` depends on the grid type.
    ei : int
        Encoded index of the identified grid cell or face. This value can be cached for
        future lookups to accelerate repeated searches.

    Raises
    ------
    FieldOutOfBoundError
        Raised when the queried point lies outside the bounds of the grid.
    NotImplementedError
        Raised if the search method is not implemented for the current grid type.
    """

Benefits of this approach

  • Clarifies the return values of .search() and .ravel_index() based on the Grid’s Axes.
    • For 2D grids, only relevant axes are returned - e.g., no need to return zi, zeta = (0, 0). This simplifies interpolator logic and removes the need for grid dimensionality checks in the interpolator.
    • Removes the need to coerce arrays to 4D, superseding #2047.
  • Unifies the Grid API, stabilizing BaseGrid behavior across subclasses (XGrid, UxGrid) and enabling future extensions.
  • Clarifies the link between Axes and barycentric coordinates.
  • Provides full particle position on the grid via the new API (including the zeta value).
  • Improves maintainability, as all grid logic stays within the Grid class.

Additional considerations

  • Caching ei may be unnecessary for 0D or 1D grids, where lookups are already fast.
  • Unclear role of search2D parameter. Is this legacy or is it still needed?
  • Sigma layer support: Future users must provide F-points for depth (i.e., a 3D depth array based on X and Y via bathymetry). This would require users to broadcast the sigma ratio with their bathymetry before passing to Parcels (which I think is a good direction for us to go in when we get around to it).
  • Particles still live in z,y,x space (just this might not be relevant when indexing on some grids). For surface advection we can have z=0

Keen to hear your thoughts (on concept, implementation or terminology) here.

[^1]: Once finalised, I think we would benefit from documenting this terminology (internally and in documentation for those interested in writing interpolators).

VeckoTheGecko avatar Jun 26 '25 11:06 VeckoTheGecko

@fluidnumerics-joe and @erikvansebille , keen to hear what you think. Let me know if anything is unclear

VeckoTheGecko avatar Jun 26 '25 11:06 VeckoTheGecko

Nice write-up, @VeckoTheGecko! I think this is indeed very clarifying, but one point to add:

I would also call Time an axis. There is no real distinction, as far as Parcels Fields are concerned, between the spatial and time dimensions. There's also an equivalent 'barycentric coordinate' for time, which I propose we call tau:

tau = (time - grid.time[ti])/(grid.time[ti+1-grid.time[ti])

If you agree, can you update the description above to incorporate time?

erikvansebille avatar Jun 26 '25 11:06 erikvansebille

I intentially left out time here since I think incorporating it into the Grid class (i.e., having time evolving grids) adds a lot of complexity that is a bit unknown for the moment.

I think we should instead rename in the above proposal Axis to Spatial Axis to be clear that time is not considered part of the grid - does that sound good?

VeckoTheGecko avatar Jun 26 '25 11:06 VeckoTheGecko

Well, I'm not sure that's the best strategy. I can imagine that in the end, users will want to have their (ti, zi, yi, xi) and (tau, zeta, eta, xsi) for the interpolation. The Grid can also have a time dimension?

Plus, as you say, we will want to go to time-varying grids at some point (not v4-alpha, but perhaps v4.0 or at least v4.1), so then we will have to move time index search here too

erikvansebille avatar Jun 26 '25 11:06 erikvansebille

I can imagine that in the end, users will want to have their (ti, zi, yi, xi) and (tau, zeta, eta, xsi) for the interpolation

Yes, 100%, in the interpolator it would look something along the lines of:

ti, tau = field.search_time_indices(...)
particle_position = grid.search(z,y,x)
...# continue with interpolation using the data

Plus, as you say, we will want to go to time-varying grids at some point (not v4-alpha, but perhaps v4.0 or at least v4.1), so then we will have to move time index search here too

Yes, at some point. I don't know what complexity adding time varying grids entails (I imagine its not trivial) - for now I just want to limit development to the Spatial Axes so that we have a clearly defined responsibility for the Grid class to support work for v4-alpha and v4. In future we can explore time varying grids - then it would make it into the grid class (the way this is set up it would be an isolated change).

If you have ideas on what indexing on a time varying grid looks like, please put up an issue so that we can discuss (even just an issue with names of models on time varying grids, or linking relevant resources would be beneficial).

VeckoTheGecko avatar Jun 26 '25 11:06 VeckoTheGecko

I think there are two quite fundamentally different pathways here. One is to treat the time axes separate to the spatial axes, as you propose above. The other is to treat all four axes (time, depth, lat, lon) the same. I think I'd advocate for the latter.

Below some of the reasoning:

  1. An xarray dataset typically has time as one of the dimensions and coordinates, so this approach would closely follow xarray layout.
  2. Time should be part of the Grid specification, in my opinion. I would say that a bathymetry dataset (lat, lon) and a sea surface height dataset (time, lat, lon) have different Grids, because they have different time dimensions
  3. I've seen in v3 that having to deal with 2D, 3D and 4D grids is very cumbersome and requires lots of internal logic. That is why I like the idea of coercing every Xgrid Field to a 4D array
  4. The Field interpolation API (the square brackets) already assumes that every Field is 4D, so users expect that
  5. You are right that interpolation methods will need logic to deal with length-1 dimensions, but they would otherwise also need logic to distinguish between 2D and 3D arrays, so I don't really see the benefit

In your example API above (grid.search(z, y, x)), what happens if a grid is 2d in latitude/depth, instead of latitude/longitude? This is the case in for example wave-flume simulations. Or what if it's 1D in latitude (e.g. for sun angle)

Perhaps best to discuss this in a chat?

erikvansebille avatar Jun 27 '25 06:06 erikvansebille

@VeckoTheGecko and I just had a quick chat to discuss the points above and agreed that

  1. point 3 above is not really relevant because xdcm works with labelled axes, so no need for the 4D coercion
  2. that we will first implement Xgrids without time, but explicitly leave the option open for time to become part of the Xgrid search too.

We see two downsides of first implement Xgrids without time: this may need some work on UxGrids too (correct, @fluidnumerics-joe?) and that it would mean that Fields that share a spatial grid but not have the same time dimension will require an extra encoded index, leading to larger memory footprint.

On the other hand, including time in the Grid facilitates time-varying coordinates (such as depth), and means that we can store the time index (ti) with the spatial indices in one encoded index. Note that in v3 we don't store ti at all, because it is by definition always 0 because of the two time slices that we keep in memory. But this part of the code will be dropped in v4, so we will need to keep track of ti (or compute it on the fly) for every time coordinate

erikvansebille avatar Jun 27 '25 10:06 erikvansebille

Some background a bit to make sure we're all on the same page...

At the moment, the Field.data objects hold either the XArray or UXArray dataarray for each field. Within UXarray, the Field.data.uxgrid holds only the lateral components of the grid (face, vertex, edge information in lat,lon or cartesian). The vertical positions of each layer and layer interface are stored via separate dimensions and coordinates (nz for the layer interface positions and nz1 for the layer centers). The Field.grid object (being either a Parcels.XGrid or Parcels.UXGrid) has been a nice clean way to define a consistent API between structured and unstructured grids to enable particle searching.

In issue #2031 and PR #2032, we discussed the necessity of adding the layer interface positions directly as an attribute of the Parcels.UXGrid object. While it felt a little redundant, because this information is already in the Field.data object, having Parcels.UXGrid.z = Field.data['nz'] under the Parcels.UxGrid, we were able to push vertical grid searching logic entirely in the search method. The hitch here at the moment, is that we are now relying on the end user to properly set this (a topic of discussion for another day, perhaps?)

On searching for the time indices...

At the moment, time index searching is handled in a separate module _index_search.py in the search_time_indices.py . As more of the search logic is put underneath the XGrid and UXGrid classes, it appears this module is shrinking. The last remaining routine here, I think, will just be this search_time_indices.py. Currently, we're calling this method directly in Field.eval (See https://github.com/OceanParcels/Parcels/blob/v4-dev/parcels/field.py#L305) and then calling Field.grid.search. What I like about this setup is the code re-use; both Xarray and UXarray data arrays share the same time searching logic. If there were a way to have an internal API function inherited from BaseGrid, that would be good. Alternatively, if we have a bit of code duplications between XGrid and UXGrid search methods, that wouldn't be the end of the world.

To your point though @erikvansebille , pushing time searching under the grid would help encapsulate ti inside the encoded indices.

@VeckoTheGecko - Thanks for setting up the glossary/language to make communicating about this much clearer. This is tremendously helpful

fluidnumerics-joe avatar Jun 27 '25 14:06 fluidnumerics-joe