flopy perf(geospatial): optimize grid.intersect() with KD-tree + ConvexHull

Summary

Optimize grid.intersect() performance across all grid types using KD-tree spatial indexing with pre-computed ConvexHull equations and fully vectorized structured grid with searchsorted. This is all on the assumption that grids will never be non-convex and always layered.

Performance Improvements

geospatial_performance_results.csv ~50-300x speedup

Key Optimizations

KD-tree + vertices indexing: O(log n) candidate cell search using centroids + vertices
Pre-computed ConvexHull equations: Vectorized point-in-polygon testing via hyperplane equations
Bounding box rejection: Fast elimination before expensive polygon tests
Deepcopy fix: Disabled _copy_cache during index build to avoid bottleneck
Vectorized result processing: Eliminated Python loops in grid.intersect() methods

Changes

Add flopy/utils/geospatial_index.py with GeospatialIndex class
Optimize VertexGrid.intersect() - 2D kdtree, 2D convexhull and layer search
Optimize UnstructuredGrid.intersect() varies_by_layers=False - 2D kdtree, 2D convexhull and layer search
Optimize UnstructuredGrid.intersect() varies_by_layers=True - 3D kdtree, 3D convexhull
Optimize StructuredGrid.intersect() with vectorized row/col finding - searchsorted
Add edge case tests for thin sliver cells, boundaries, corners
GeospatialIndex now raises ValueError if passed a StructuredGrid
All methods now return np.nan for outside points (previously StructuredGrid returned (None, nan, nan) with z, and GeospatialIndex.query_point returned None)
Array queries return int64 when all points inside, float64 when nan values present
Matches behavior of native VertexGrid.intersect() and UnstructuredGrid.intersect() on develop
1:1 point-to-cell mapping
All methods guarantee same number of outputs as inputs
Outside points get nan instead of being dropped or raising errors (with forgive=True)

Test coverage

~38 tests covering return type consistency, tie-breaking, 3D layered grids, and boundary handling

Regressions

grids with less than 10 cells or points perform worse due to overhead associated with new methods, but execute in less than 0.5 ms so it doesn't matter.

Nov 23 '25 05:11 aacovski

Codecov Report

:x: Patch coverage is 87.13911% with 49 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 72.7%. Comparing base (556c088) to head (9320199). :warning: Report is 91 commits behind head on develop.

Files with missing lines	Patch %	Lines
flopy/utils/geospatial_index.py	86.9%	49 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2654      +/-   ##
===========================================
+ Coverage     55.5%    72.7%   +17.1%     
===========================================
  Files          644      669      +25     
  Lines       124135   129750    +5615     
===========================================
+ Hits         68947    94342   +25395     
+ Misses       55188    35408   -19780

Files with missing lines	Coverage Δ
flopy/discretization/grid.py	`75.7% <100.0%> (-0.2%)`	:arrow_down:
flopy/discretization/structuredgrid.py	`50.3% <ø> (+2.8%)`	:arrow_up:
flopy/discretization/unstructuredgrid.py	`75.6% <ø> (-5.9%)`	:arrow_down:
flopy/discretization/vertexgrid.py	`79.2% <ø> (-4.5%)`	:arrow_down:
flopy/utils/geospatial_index.py	`86.9% <86.9%> (ø)`

... and 556 files with indirect coverage changes

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Nov 23 '25 23:11 codecov[bot]

@wpbonelli I’m looking for some guidance on how to handle these tests. The issue arises when a point lies exactly on the vertices of grid cells, making it ambiguous which of the 3-n cells it should belong to:

FAILED test_grid.py::test_intersection - assert 268 == 267

In these non-deterministic cases, different methods return different cell indices. I can force the kdtree-based methods to return the expected value, but that comes with a performance cost. It then causes another test failure (assert DIS == DISV), since those methods also behave differently. I could also update the structured grid path to enforce consistent behavior so all tests pass.

Before making those changes, I’d like your input on the preferred approach to resolve these cascading test failures.

Nov 25 '25 19:11 aacovski

No performance hit with changing tiebreaking

Nov 26 '25 08:11 aacovski

@wpbonelli I’m looking for some guidance on how to handle these tests. The issue arises when a point lies exactly on the vertices of grid cells, making it ambiguous which of the 3-n cells it should belong to:

@aacovski in the gridintersect module, in case of ties, we decided to always return the lowest grid cell number. But I think that any approach as long as it is clearly documented (and maybe consistent across runs) should be fine?

Nov 26 '25 09:11 dbrakenhoff

Agree with @dbrakenhoff I think lowest number is the existing convention so might as well stick to it if there is no performance cost?

Nov 26 '25 11:11 wpbonelli

Great, that's what I did. I'll consider adding more tests for coverage, but otherwise I think the PR is ready. Thanks

Nov 26 '25 14:11 aacovski

across all grid types

grids will never be non-convex and always layered

does the latter mean the new grid index and related performance boost don't support DISU?

Nov 26 '25 15:11 wpbonelli

does the latter mean the new grid index and related performance boost don't support DISU?

As I understand it, “fully 3D unstructured” grids are more characteristic of FEM models, and I’m not aware of MODFLOW supporting that kind of discretization. That said, my code does support DISU in the sense that varies_by_layers can be set to either True or False... I'm not aware of any FDM or FEM software supporting non-convex elemnts/grids. Please let me know if I'm missing anything.

Nov 26 '25 15:11 aacovski

I am going to clean up docstrings later today as I noticed some old info. Sorry about that. I want to add some more tests too.

Nov 26 '25 16:11 aacovski

@aacovski I guess I just didn't properly understand what you meant by "always layered"

Nov 28 '25 17:11 wpbonelli

it's maybe worth noting that the index assumes that x/y cell centers are in fact centroid coords?

I hadn’t considered that the provided cell centers might not correspond to geometric centroids even for convex polygons. The index shouldn’t assume that, so I’ll update to clarify how cell centers are interpreted. Thanks for pointing it out!

Nov 29 '25 03:11 aacovski

The index shouldn’t assume that [the provided cell centers correspond to geometric centroids], so I’ll update to clarify how cell centers are interpreted.

Ah, ok- and the index doesn't even assume the center is inside the cell? It just performs better if so?

I hadn’t considered that the provided cell centers might not correspond to geometric centroids even for convex polygons.

I think the intention was/is for it to be cell centroid, even though we don't check this when provided by the user or read from a binary grid file. Evidence:

https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/mf6/utils/binarygrid_util.py#L327 defines cell centers as centroids, maybe (but in any case suggestively) as a typo
- https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/mf6/utils/binarygrid_util.py#L329
https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/utils/cvfdutil.py#L17 is used when calculating x/y cell centers in other routines
- https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/utils/cvfdutil.py#L143
- https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/utils/cvfdutil.py#L306
- https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/utils/cvfdutil.py#L369
https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/utils/geometry.py#L739 is used from
- https://github.com/modflowpy/flopy/blob/e42f51117d1e0db8d9d7eb22eb8eae713a98262e/flopy/discretization/unstructuredgrid.py#L1062

Maybe something to clarify in the MF6IO guide. For DIS it doesn't matter but it becomes relevant for DISV/U.

Dec 01 '25 14:12 wpbonelli

@aacovski it wasn't really clear from previous comment but I think it's best to revert the docstrings to say centroid rather than leaving it ambiguous. my first comment was really to point out the (unwanted?) ambiguity in the existing grid classes, not to say your new ones should change

Dec 02 '25 11:12 wpbonelli

Another thought: why not move the structured grid searchsorted query implementation to GeospatialIndex? That way all the Grid.intersect methods could use the index. Since the index receives the grid on init, it knows what kind it is. This seems a bit neater, and also opens the way for cases where a structured grid query might still find the KD tree useful, e.g. ball queries.

Also just want to say, thanks for working on this. This seems like it will be a great asset both in the near term and longer term- in 4.x it could sit behind xarray-based user APIs, whether builtin .sel syntax (point queries) or, for the fancy stuff, maybe an accessor.

Dec 02 '25 21:12 wpbonelli

I'll work on implementing that: moving intersect to base class and remove the subclass intersect methods. Regarding the docstrings: since there is a difference between centroid and centers, based on everything I've read, it makes more sense to use the more general case (centers) rather than centroids which could be inaccurate.

Dec 02 '25 23:12 aacovski

I'm pretty sure "cell centers" is meant as centroids, that's why I'm suggesting to change the doctrings up in the grids themselves. But I could be wrong. Hopefully someone with more tenure can chime in.

Dec 03 '25 00:12 wpbonelli

I'm pretty sure "cell centers" is meant as centroids, that's why I'm suggesting to change the doctrings up in the grids themselves. But I could be wrong. Hopefully someone with more tenure can chime in.

For DISV grids the user provides the x and y node coordinates for each cell. In general, this coordinate is the centroid of the cell, however, there is at least one situation where this is not the case. Along the edge of the model domain, there may be half cells (common with Voronoi) where it may be beneficial to assign the x,y cell center coordinate to be on the model domain boundary (and the edge of the cell). The location of a constant-head boundary (or constant concentration), for example, would effectively be assigned to the node coordinate, which a user may want to put on the model edge. This x, y node coordinate is used by DISV (and DISU in some cases) to calculate geometric properties for connections between adjacent cells.

Dec 03 '25 13:12 christianlangevin

thanks @christianlangevin. so @aacovski the more general "cell centers" is better as you suggest. we may update the description upstream in mf6io but no need to change the grid docstrings here

Dec 03 '25 14:12 wpbonelli