geos
geos copied to clipboard
C API for shared edge simplification?
Being able to correctly simplify shared boundaries between polygons (or in theory any linear features with shared edges) is an ongoing need for users of GEOS (e.g., GeoPandas #1387, Shapely #1425, R Simple Features #381 to name a few). Indeed, some of these questions come up because users are surprised that the topology-preserving simplification in GEOS instead means that simplification is performed on a per-geometry basis and that outputs are individually topologically correct, yet the use within a set of polygons of shared boundaries produces gaps and overlaps (not topologically correct for the set of polygons).
There are packages in various ecosystems that help perform this simplification, including topojson, mapshaper, and others. In addition to the algorithms for performing the simplification of shared boundaries, these packages also center around TopoJSON as a representation of the topological relationships.
Whether or not GEOS should include a topological representation is a bigger issue that is outside the scope of what I'm trying to raise here. Instead, I wanted to see if there was interest in a more simplistic approach within GEOS as it exists now and what that API might look like.
To grossly simplify the simplification process (as I understand it):
- given input linear features that may have underlying shared edges
- find all shared edges and decompose geometries into sets of those while retaining unshared edges
- perform some sort of simplification to shared and unshared edges; for each shared edge the simplification is performed once no matter how many geometries share that edge
- topological cleanup (e.g., dealing with holes in polygons whose boundaries have now been moved, so that output polygons are individually topologically correct)
- reassemble geometries
For GEOS, perhaps this could be presented in an API that consumes and emits collections of geometries without needing an alternative I/O representation. Ideally this could be generalized across simplification algorithms (e.g., DP, VW, etc), rather than needing a separate implementation for each.
This seems like it could also leverage some of the existing capabilities and ongoing work around things like coverage union (e.g., #638) as well as STRtree since shared edge detection is a slow step that needs a spatial index.
From a C API perspective, I think it could look something like
*GEOSGeometry GEOSSharedEdgeSimplify(
GEOSGeometry* g,
double edge_det_tolerance,
GEOSSharedEdgeSimplifyCallback callback,
void *userdata
)
(don't get fixated on the names, they are just examples)
The input GEOSGeometry would probably need to be a GeometryCollection because:
- we may want to support cases of polygons and lines along their boundaries
- polygons may have overlaps (because data are not clean), which means that MultiPolygons are not valid.
edge_det_tolerance
is for detecting shared edges; I think of this similar to gridsize
in terms of set precision. Input edges may be very close to shared but have non-identical vertices; this tolerance parameter could be used to coerce them into being shared edges (I think?).
GEOSSharedEdgeSimplifyCallback
is what makes this generalizable. It would look something like:
*GEOSGeometry GEOSSharedEdgeSimplifyCallback(GEOSGeometry *geometry, int is_shared, void *userdata)
The callback would take as input:
- LineString (?) for shared edges, and is_shared would be 1
- Polygon, LineString, Point, etc for isolated geometries, and is_shared would be 0
The output would need to be of lesser or equal dimensionality (if we allow collapse). If is_shared is 1, it must collapse no further than a point representing the start and end point locations (if we allow collapse), or a LineString connecting those.
The caller would provide this callback, and perform whatever simplification algorithm they want within it. Perhaps they would call the DP simplification already available in GEOS or VW once it becomes available, or they could provide an algorithm outside GEOS.
userdata
could be a struct that holds whatever simplification parameters they want to pass to their simplification algorithm; this is where they would pass in the actual tolerance parameter for their simplification.
One of the important things we have to watch for here is that we need to (at least optionally) be able to get back out a GeometryCollection that is exactly the same length and same order as the input GeometryCollection, where input and output geometries are related 1:1. This is so that we can relate the outputs back to anything we have attached to the inputs (e.g., other record-level data).
Thoughts?
/cc @martinfleis @jorisvandenbossche @mattijn @edzer
Thanks for starting to think about this. The core functionality of coverage building and simplification is still on the drawing board, to it's a little early to set any kind of API in stone. This is a good reminder that coverage handling (cleaning, simplifying, merging, overlay) are currently un-handled parts of the GIS pantheon that many many users of GEOS are eager to see filled in.
I would greatly love to see something like this available either through GEOS or another C/C++ library! The sheer amount of complex javascript-to-xxx bridges in use just to expose the mapshaper simplification operations to other non-javascript tools is testament to how widespread the demand for this operation is :scream_cat:
Cc @rsbivand @paleolimbot
This would be fantastic!
Just a note that I think that S2 can do this via the builder, which can simplify multiple objects at once without creating new intersections (https://github.com/google/s2geometry/blob/master/src/s2/s2builder.h#L113-L117). Making it accessible via the R package isn't something I've tried yet, though, so I don't know if the result is exactly what y'all are looking for here.
Thumbs up! When starting with python topojson I thought I could access speedy GEOS functions through shapely, but as turned out this was not the case and so I decided to write it out in python using dicts and numpy array broadcasting where possible. It has been an interesting experience and have documented the approach I've adopted. The process was more complex than I'd anticipated..
My limited experience with S2 (through this issue https://github.com/mattijn/topojson/issues/50) is that the edges are actually not shared when zoomed in. This also fits the narrative that S2 is focused on spheres.
I'm also not sure how one could do this without defining a topological representation within GEOS.
I will subscribe to this issue, but not sure if there is anything I can contribute here.
Thanks @brendan-ward for the detailed suggestion and analysis. As Paul alluded to there is some early-days work being done on improving the GEOS functionality for handling polygonal coverages. Clearly coverage simplification is of great interest, so that helps to focus the development effort.
As Paul also says, the API will be ultimately be driven by the implementation. Some of the capabilities you suggest are obviously core to coverage simplification, while others imply a greater design effort. For instance, supporting multiple simplification algorithms will require significantly more complex code and a wider API. I think the initial effort will focus on VW simplification, since it is usually considered to provide a better quality result for polygons. Supporting a snapping tolerance is also more complex than simply processing the data as is. But I'm optimistic that coverage misalignment can be handled in a graceful way, where the output won't be any worse than the input, at least.
Most likely this work will involve creating an internal topological data structure, but as you say this doesn't need to be exposed in the C API. Definitely the output elements should match the input elements 1-1, so that they can be easily mapped by client code. In particular, this will allow GEOS Coverage Simplification to be exposed as a window function in PostGIS.
Stay tuned!
supporting multiple simplification algorithms will require significantly more complex code and a wider AP
To be clear, the callback idea was specifically intended to mitigate that, requiring the caller to implement that using available GEOS C API simplification methods. However, this occurs in a very hot loop in the process, so in addition to putting an awful lot of responsibility on the callback (to pass back in a valid geometry), it may a big performance hit.
the initial effort will focus on VW simplification
👍 This is a great idea.
Fully understood and agreed that figuring out the API before enough of the implementation is in place is putting the cart before the horse. The goal was definitely not to set the API in stone or use it to constrain the implementation, but rather to use that to help get the conversation rolling, since that is how many of us outside GEOS would interact with it.
However, I think one of the key things raised here - which it looks like there is agreement on - is the aspect of the API that involves passing in a collection and getting back a collection that can be related 1:1 with the original. I might be mistaken, but I don't think that approach has been used previously for other C API functions.
@paleolimbot this is separate from S2, unless you were suggesting using the API or implementation in S2 for spherical geometries as inspiration here?
It looks like @pramsey is tasked with #635 to port VW Simplifier from JTS, which is the first of many steps.
To be clear, the callback idea was specifically intended to mitigate that, requiring the caller to implement that using available GEOS C API simplification methods. However, this occurs in a very hot loop in the process, so in addition to putting an awful lot of responsibility on the callback (to pass back in a valid geometry), it may a big performance hit.
Yes, in order to make coverage simplification performant, the simplification strategy needs to be able to check if/when a change would cause invalidity to occur, and then back off. This requires some spatial indexing to be maintained. This needs to be exposed to the simplification strategy via an internal API. So there's a bit of a library-client dance required, which would be "fun" to implement in C!
one of the key things raised here - which it looks like there is agreement on - is the aspect of the API that involves passing in a collection and getting back a collection that can be related 1:1 with the original. I might be mistaken, but I don't think that approach has been used previously for other C API functions.
Definitely that is a key aspect of the design for this kind of bulk processing. Up until now GEOS (and JTS) have mainly been focussed on operations on single or pairs of geometries. Moving into providing bulk operations needs to follow this new pattern.
I didn't look into it, but under the hood, spatialite uses the following library: https://gitlab.com/rttopo/rttopo. Maybe a python wrapper of this library is an option?
Note that the first steps along the path to providing polygonal coverage operations in GEOS has been commited. It includes:
- validation
- gap finding
- union
These have not yet been exposed in the C API, pending strategizing about how to define a consistent API for coverage functions in general.
Coverage simplification is now being worked in JTS.
Coverage Simplification has now landed in JTS: https://github.com/locationtech/jts/pull/911 and https://github.com/locationtech/jts/pull/963.
Hopefully this will be ported to GEOS soon, and exposed via the C API.
See also blog post here.
Thanks for the work @dr-jts! This is something to look forward to. One question. What type of strategy did you implement to detect junctions. Do you consider a path/arc shared when coordinates are on the same path or when coordinates appear in both paths?
Eg. will the following example result in two shared arcs or one?
from shapely import geometry
from shapely.plotting import plot_line
import matplotlib.pyplot as plt
data = geometry.MultiLineString([
[(0, 0), (10, 0), (10, 5), (20, 5)],
[(5, 0), (20, 0), (20, 5), (10, 5), (0, 5)]
])
fig = plt.figure(figsize=(6.5, 2.5))
ax = fig.add_subplot(111)
plot_line(data.geoms[0], ax=ax, add_points=True, color='green', linewidth=5, alpha=0.7)
plot_line(data.geoms[1], ax=ax, add_points=True, color='orange')
plt.show()
Hopefully two🤞
What type of strategy did you implement to detect junctions. Do you consider a path/arc shared when coordinates are on the same path or when coordinates appear in both paths?
The input to the simplifier algorithm is a set of polygons. So not sure whether or how your question applies to that?
Not exactly sure either.. based on this code comment:
* * If the polygon boundary intersects another polygon boundary, the vertices
* and line segments of the intersection match exactly.
It seems a coordinated connected strategy is implemented, but again, what this exactly will mean for real-world use cases is yet to see. The goal here is also simplification and not returning coverages or topologies perse. Thank you for the feedback!
@mattijn check out this comment in the blog post regarding coverage validation:
And of course the next logical step is to provide the ability to fix errors
detected by coverage validation. This is a research project for the near term.
It's in that "fix" step that the cleanup will need to be implemented to get from a "path connected" to a "coordinate connected" or "valid polygon coverage" state I suppose.
I'm going to close this as "finished".