Add algorithm for intersection of geometries with a rectangular grid
I am working on a project to expose the exactextract library to Python in a way similar to the R package exactextractr.
Instead of creating separate Python bindings for this library, I am considering attempting to move much of the code of this library into GDAL where it would become accessible through GDAL's Python bindings. Related but not necessarily dependent on this, I am thinking about whether some of the code could move into GEOS -- specifically, an optimized intersection calculation between rectangular grids and polygonal/linear geometries.
The result of the overlay is most commonly expressed as the fraction of each grid cell that is covered by a polygon, although it is also possible to get the total length of linear geometries within each grid cell or (more experimentally) a Geometry representation of each polygon/grid cell intersection, e.g.
This code has a strong test suite and has had pretty good public adoption since the R package was published in 2019, so I think it can be considered reasonably mature.
I don't have a detailed proposal yet, but I imagine that a minimal-footprint addition to GEOS would add something along these lines:
// define a grid
GEOSGrid* GEOSGrid_create(double xmin, double ymin, double xmax, double ymax, double dx, double dy);
// calculate a Grid coincident with `grid` but covering only `geom`
GEOSGrid* GEOSGrid_clip(const GEOSGrid* grid, const GEOSGeometry* geom);
// partition a polygonal geometry by a grid
GEOSGeometry* GEOSGrid_intersection(const GEOSGrid* grid, const GEOSGeometry* geom);
// calculate a raster with the fraction of each grid cell that is covered
// to avoid introducing a matrix type (and having slow access through non-inlined function calls)
// just return a float* and access values with x[i*cols + j] or something
float* GEOSGrid_intersectionAreas(const GEOSGrid*, const GEOSGeometry* geom);
Just thought I'd create an issue for this in case anyone has thoughts. I'm going to aim to have a better formed proposal before the Vienna code sprint.
The proposed GEOSGrid_create has xmin, xmax and dx as floats, which is potentially problematic since dx may not be a multiple of (xmax - ymin). How about something based on a grid shape with offset to the top left corner:
GEOSGrid* GEOSGrid_create(int nrow, int ncol, double dx, double dy, double ul_x, double ul_y);
These parameters can be evaluated from bounding box inputs or raster affine transformation coefficients. There is also the possibility of adding a rotation term too.
But generally, the addition of grid intersection area capabilities are welcome from my perspective! I've done these calculations using various methods, and see value in adding these capabilities to GEOS.
W. R. Franklin proposed a clever algorithm for computing the area of intersection of two polygons without forming the actual geometry of intersection. The advantage of this algorithm is it's simplicity and robustness. There's a (mostly complete) implementation in a JTS branch: OverlayArea
Obviously it would be easy to adapt this to using a rectangle as one of the polygons. Might this be worth looking at?
The proposed GEOSGrid_create has xmin, xmax and dx as floats, which is potentially problematic since dx may not be a multiple of (xmax - ymin).
Yes, there is a definite conflict between how people often talk about grids ("it goes from -180 to 180 at 10 arc-second resolution") and how the math works out. The approach I took is to accept the xmin/xmax based specification and allow the size of the rightmost column/bottom row to adjust to compensate. In the example above the right boundary of the rightmost cell would be adjusted from 179.99999999999994 to 180.
How about something based on a grid shape with offset to the top left corner:
This would work, too.
There is also the possibility of adding a rotation term too.
exactextract does not handle rotated rectangles, though I suppose the same approach could work. Just a trickier implementation.
Obviously it would be easy to adapt this to using a rectangle as one of the polygons. Might this be worth looking at?
A primary advantage of exactextract is calculating the intersection area against all rectangles in a single pass, rather than performing m x n separate intersection calculations.
Would the GEOS API actually get used by 3rd parties? Or is it just to support the exactextract lib whereever it ends up?
I'd hope for it to be used by other GEOS users, otherwise there is no benefit to putting it into the library.
Updated to support multipart polygons and holes. It takes about 5 seconds to subdivide the GADM level 0 boundaries (256 polygons, 32.8 million vertices) against 250,000 grid cell polygons:
I think the primary benefit in GEOS would be this polygon subdivision, but I would need an API to efficiently get the part I need (coverage percent) without materializing all of the result polygons.
Very nice!
Another possible interface -- return an "intersection result" that makes it possible to get either the overlap fractions or geometries either in bulk or individually by row/col.
GEOSGridIntersection* GEOSGrid_intersection(const GEOSGrid*, const GEOSGeometry* geom);
float* GEOSGridIntersection_getIntersectionAreas(const GEOSGridIntersection*);
float GEOSGridIntersection_getIntersectionArea(const GEOSGridIntersection*, size_t row, size_t col);
GEOSGeometry* GEOSGridIntersection_getIntersectionGeometries(const GEOSGridIntersection*);
GEOSGeometry* GEOSGridIntersection_getIntersectionGeometry(const GEOSGridIntersection*, size_t row, size_t col);
I can imagine using this to partition up work, so getting the intersection geometries in a structured way (rather than combined in a GeometryCollection) could be useful.
Does row/col make sense? for a concave shape, there will be places where there is no geometry in some grid cells. Is the upside value of doing row/col worth the extra scruft of a new object to create/destroy?
there will be places where there is no geometry in some grid cells.
This is where row/column access could be useful, because otherwise there is no obvious correspondence between grid cells and result geometries. As an example, I was thinking about a point-in-polygon algorithm that first divides the space into, say, a 100x100 grid. To do to the PIP test, you calculate the cell associated with your point, check to see if that cell has a covered fraction of 0 or 1, and then pull the corresponding geometry to do a regular PIP test only if the fraction is in between.