feature: faster and better point intersection in `GridIntersect`
Is your feature request related to a problem? Please describe.
GridIntersect.intersect() is slow for points because it has to perform an intersection between the MultiPoint and each of the gridcells within the spatial query that limits the search space. And GridIntersect.intersects() doesn't really return a useful result, since it only returns the addresses of touched cells, and not a link between the input points and their location in the grid.
Describe the solution you'd like
For points it would be more useful if the intersects() method would return an array of len(points) with the address of the grid cell each point intersects with.
import shapely
from flopy.utils.geospatial_utils import GeoSpatialUtil
from flopy.utils import GridIntersect
grid = ... # some grid
x = np.arange(10)
y = np.arange(10)
pts = shapely.points(x, y)
gi = GridIntersect(grid)
gi.intersects(pts) # --> result with len(npts) with cellids, or nan if outside of grid
Additional context This change will introduce new behavior for arrays of geometries, which were not really supported before. I'm worried it might make the intersect methods a bit more confusing to navigate since the result depends on the input type...
- intersects():
- MultiPoint --> Cellids containing 1 or more points
- array of points --> cellid per point
- intersect():
- MultiPoint --> Points grouped per cell
- array of points --> cellid per point (aggregation left to the user?)
So I'd welcome any suggestions on how to support the arrays, without making it too confusing? Perhaps a separate array-based method?
This is related to #2649.
I'd recommend optimizing the existing Grid.intersect(x, y) methods rather than adding new APIs. MultiPoint is slow.
The optimization could use scipy.spatial.cKDTree on cell centroids:
- Query k-nearest cells via KDTree (fast)
- Test exact containment for only those candidates (
scipy.spatial.ConvexHull) - Lines/Polygons: Extract vertices (with densification if sparse), use hull equations for fast containment tests using cell vertices
Caveat: For large/sparse geometries (e.g., a long LineString with only 2 vertices spanning 100 cells), the ConvexHull-based candidate selection might miss cells along the path since it only queries near the geometry's centroid. Densifying such geometries before computing the ConvexHull ensures complete cell coverage - but even with densification overhead, the approach is very likely still faster.
For GridIntersect, I'd suggest renaming for clarity:
intersect():get_intersection_details()(returns lengths/areas/geometries)intersects(): deprecate (replaced by optimized grid.intersect() for simple cellid lookups)
This keeps the separation clean:
grid.intersect(x, y)- Simple, fast point→cellid lookup (optimized with KDTree)GridIntersect.get_intersection_details()- Complex geometric intersection analysis
I apply a similar method in MINEDW for mapping 6-7GB DXF files to FEM mesh elements (one geometry → many cells). The FloPy use case is inverted (1:1 point → cell mapping), but KDTree pre-filtering with exact containment tests still provides massive speedup over brute force.
Thanks for your thoughts. The reason GridIntersect was added was to implement more complex intersection operations while keeping flopy's core as lean as possible in terms of dependencies (i.e. no shapely). There's certainly an argument to be made that if shapely becomes a full dependency, GridIntersect should be moved entirely to the grid class. I'm not actually sure what the policy currently is on the dependencies. Perhaps @wpbonelli can clarify (if there is one)?
In the meantime, the intersection operations are already in GridIntersect, so we might as well improve them. A separate issue/PR can then focus on streamlining the differences between GridIntersect and Grid :).
I will put together a PR, and we can decide where to go from there once it's ready.
@aacovski, you can see my point location implementation in #2657. No idea how it compares to your work, but mine is still only implemented for vertex and structured grids at this moment. But seems worthwhile to include this in the GridIntersect class nonetheless, so I think it's okay if there are 2 implementations within flopy.
I'm not actually sure what the policy currently is on the dependencies
Neither are we, to my knowledge. I think the (de facto, little discussed) approach has been as you say, keep requirements minimal and almost everything else in an optional group called optional. About 2 years ago we pulled pandas into the core as it's a foundational part of the ecosystem now, and it's also kind of a "core" thing (tabular data in the object model) where e.g. shapely is more peripheral (pre/post-processing) but I don't know how useful/accurate that distinction is- most people will be doing pre/post when they use the model objects..
I'd say we're open to suggestions? I looked at what some other libraries do here. Seems like a mix. I can see pros/cons of both extremes.