sf
sf copied to clipboard
New algorithm for planar `st_make_valid`
GEOS 3.10 will include a new method for making invalid geometries valid. Should this method become the default in sf ? Is there interest in exposing parameters for the "MakeValid" algortithm in sf ? Currently the parameters would be:
- choice of algorithm
- whether "collapsed geometries" should be retained (e.g., an invalid polygon that is turned into a line)
The major difference between the two algorithms is the handling of invalid polygons with self-intersections and overlaps Here's some examples of how the new algorithm handles invalid polygonal geometry:
Polygon with self-overlap

Old result:

Polygon with overlapping holes

Old result:

MultiPolygon with overlapping elements

Old result:

Originally posted by @dr-jts in https://github.com/libgeos/geos/issues/433#issuecomment-821597724
@dbaston Dan: do you know whether the choice of GEOS_MAKE_VALID_ORIGINAL or GEOS_MAKE_VALID_BUFFERED also runs through GEOSMakeValid_r() controlled say by an environment variable, or only GEOSMakeValidWithOptions_r(). If the former, reverse dependency checking will not involve changes in compiled code in R packages using GEOS, which would let us report quicker. I'm rebuilding the github pramsey fork main-geometry-fixer branch now, having first forgotten to make sure that I had the right branch. However, just running revdeps for rgeos/sf with the existing GEOSMakeValid_r() may be a no-op if that simply uses GEOS_MAKE_VALID_ORIGINAL, so I'd like to be sure that the new code is actually being used.
@rsbivand GEOS does not use environment variables. Currently the only way to see the new behavior would be to use the GEOSMakeValidWithOptions signature.
Thanks! I am now adding a proof-of-concept to rgeos (R-forge) including an environment variable switch to aid testing. If this is helpful, copyng into sf shouldn't be hard.
@pramsey: this thread is more specific (transfer from https://github.com/r-spatial/sf/issues/1649#issuecomment-824952228). So far in rgeos replicating the examples given (I had to fake two of the three as I couldn't find them in https://docs.google.com/document/d/19YEQS0goSpZlwaYivS6ZpxJ5gRth2gdG6aY2XVd5fcc/edit# or https://github.com/locationtech/jts/blob/master/modules/tests/src/test/resources/testxml/misc/TestInvalidA.xml):

but so far no luck using the options to keep collapsed geometries (the second buffered should also keep collapsed). Please let me know when you push to the branch in your fork.
but so far no luck using the options to keep collapsed geometries (the second buffered should also keep collapsed). Please let me know when you push to the branch in your fork.
@rsbivand I'm unclear about your test above - I don't see any obvious collapse.
If you want to test the keepCollapse option, try this geometry:
MULTIPOLYGON (((10 40, 40 40, 40 10, 10 10, 10 40), (16 17, 31 32, 24 25, 16 17)), ((50 40, 50 40, 50 40, 50 40, 50 40)))
Result with keepCollapse:
GEOMETRYCOLLECTION (POINT (50 40), POLYGON ((40 40, 40 10, 10 10, 10 40, 40 40)))

Note that keepCollapse keeps entire geometries which have collapsed, but does not keep collapsed holes.
After updating to the changed function name and parameter object, I see:

which looks about right. sp doesn't like collections of objects of different types, sf is more compliant, but even in the sp object setting this looks OK. SVN rev 653 has completed building, so the rgeos source code will be easier to access from http://download.r-forge.r-project.org/src/contrib/rgeos_0.5-7.tar.gz, but will (of course) fail on any other 3.10.0 than https://github.com/pramsey/geos/tree/main-geometry-fixer (< 3.10.0 is OK). Function help page at https://rgeos.r-forge.r-project.org/reference/topo-unary-gMakeValid.html (updated to add another environment variable to set keep collapsed).
@rsbivand I merged into main, so hopefully that is the last re-naming (sorry, did one more on you) you have to handle. Thanks for taking an early look at the output, @dr-jts will be happy to see the feedback.
@pramsey thanks for the heads-up!
With status before this commit, I did not see any differences between original/buffered/buffered-keepCollapsed for the ~200 R packages that are reverse dependencies of rgeos, but did see
"mapmisc" Error in gTopoDim(spgeom2) : class not supported:SpatialCollections "maptools" Error in rgeos::gSimplify(spgeom = SP, tol = tolerance, topologyPreserve = topologyPreserve) : Geometry collections may not contain other geometry collections
that I don't recall seeing before - in all three 3.10.dev variants. I don't think this is caused by 3.9/3.10 improvements in GEOS, but if it is, this is useful because the previous behaviour was not as intended. So far so good - next is to implement the changes in sf https://github.com/r-spatial/sf/issues/1649 in relation to s2 integration for spherical geometries. The reverse dependencies of sf will be more informative, because many of those of rgeos are older, so were written before GEOSMakeValid() was available, and most likely do not exercise the function. rgeos at SVN rev. 663 is up to date with the gitea main GEOS repo with LINEWORK/STRUCTURE.
Error analysis: in 3.10.0, there is a change in geometries returned by GEOSSimplify_r():
"mapmisc" Error in gTopoDim(spgeom2) : class not supported:SpatialCollections : Problems in GEOSSimplify_r() returning geometry collections for individual geometries - not a problem in 3.9.1. Resolved by changing the default in rgeos::gSimplify() to use GEOSTopologyPreserveSimplify_r() instead by default if not given specifically by the user or rev-dep package.
"maptools" Error in rgeos::gSimplify(spgeom = SP, tol = tolerance, topologyPreserve = topologyPreserve) : Geometry collections may not contain other geometry collections: Failure was again in GEOSSimplify_r() returning a polygon and a line but the implementation only supported same dimension in and out; resolution, use GEOSTopologyPreserveSimplify_r(). In 3.9.1, GEOSSimplify_r() passes, no failure. The specific example is at: http://maptools.r-forge.r-project.org/reference/thinnedSpatialPoly.html#examples.
Otherwise, no visible changes from GEOSMakeValid_r(), but as noted above, rgeos uses were mostly written pre-3.8.0, so probably don't use GEOSMakeValid_r() anyway. More sf rev-deps do use st_make_valid().
This is kind of odd, there's been very little change in that code line, but there is one commit since 3.9, could this have had the effect you see? https://github.com/libgeos/geos/commit/e6c6fa39e32caf8bc15022cd17f47334a510197b#diff-97c48336edbb9a6f90de3e7b4424c22c8a97c9337631ac5cab30f8ff44ad6e3e
Possibly, since in the observed cases before 3.10, simplified polygons were returned just as polygons (dimension 2), but with 3.10dev there are geometry collections with polygons and lines unless topology is preserved.
This is kind of odd, there's been very little change in that code line, but there is one commit since 3.9, could this have had the effect you see? libgeos/geos@e6c6fa3#diff-97c48336edbb9a6f90de3e7b4424c22c8a97c9337631ac5cab30f8ff44ad6e3e
That could well be the problem. And it's likely important to fix (in JTS and GEOS) since it's (a) a regression and (b) unwanted behaviour. I guess there weren't any unit tests for DPSimplfy that produced that behaviour.
Any chance of getting a simple reproducer for the simplify error?
Here are two North Carolina NAD27 counties, which return non-polygon output for 3.10dev with topology preserve FALSE, but are OK for topology preserve TRUE for the same tolerance value of 0.05 (for 3.9.1, both TRUE and FALSE pass) for this file: nc34.zip In rgeos:
> library(rgeos)
Loading required package: sp
rgeos version: 0.5-7, (SVN revision 666)
GEOS runtime version: 3.10.0dev-CAPI-1.15.0
Linking to sp version: 1.4-6
Polygon checking: TRUE
> nc34 <- readWKT(readLines("nc34.wkt"))
> xx <- gSimplify(nc34, tol=0.05)
> xx <- gSimplify(nc34, tol=0.05, topologyPreserve=TRUE)
> xx <- gSimplify(nc34, tol=0.05, topologyPreserve=FALSE)
output subgeometry 1, row.name: 2
subsubgeometry 0: Polygon
subsubgeometry 1: Polygon
subsubgeometry 2: LineString
Error in gSimplify(nc34, tol = 0.05, topologyPreserve = FALSE) :
Geometry collections may not contain other geometry collections
> class(nc34)
[1] "SpatialPolygons"
attr(,"package")
[1] "sp"

The output for county 4 alone passes, because it is a valid geometry collection, so 3 and 4 (here base-1 1 & 2, base-0 0 & 1) are both needed to show the geometry collection in geometry collection problem, which is not supported in the R class representation, in which geometry collections can only contain points, lines, polygons and their multi-variants, but not other geometry collections. I think that Currituck National Wildlife Refuge gets reduced to a line string here when topology preserve is FALSE. For 3.9.1, this line is discarded before attempting to convert back to the sp representation:

Thanks for the reproducer. This is definitely a regression. Reported upstream as GEOS 1115. We'll work on resurrecting the old behaviour.
GEOS DouglasPeuckerSimplifier is now fixed to avoid returning mixed collections (https://github.com/libgeos/geos/commit/f3dfbdc949dbe954c818e63abceb4e493ab396ff)
@dr-jts Thanks, the two failing reverse dependencies are now behaving as before without needing to impose topology preservation. Let's see how quickly sf takes up GEOSMakeValidWithParams(), because more of sf's reverse dependencies use st_make_valid().
OK, I now see, using @dr-jts ' example above:
library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
"MULTIPOLYGON (((10 40, 40 40, 40 10, 10 10, 10 40), (16 17, 31 32, 24 25, 16 17)), ((50 40, 50 40, 50 40, 50 40, 50 40)))" |>
st_as_sfc() |>
st_make_valid() |> # current defaults: geos_method = "valid_structure", geos_keep_collapsed = TRUE
st_as_text()
# [1] "GEOMETRYCOLLECTION (POINT (50 40), POLYGON ((10 40, 40 40, 40 10, 10 10, 10 40)))"
"MULTIPOLYGON (((10 40, 40 40, 40 10, 10 10, 10 40), (16 17, 31 32, 24 25, 16 17)), ((50 40, 50 40, 50 40, 50 40, 50 40)))" |>
st_as_sfc() |>
st_make_valid(geos_method = "valid_linework") |> # should ignore keep_collapsed
st_as_text()
# [1] "GEOMETRYCOLLECTION (POLYGON ((40 40, 40 10, 10 10, 10 40, 40 40)), MULTILINESTRING ((16 17, 24 25), (24 25, 31 32)), POINT (50 40))"
"MULTIPOLYGON (((10 40, 40 40, 40 10, 10 10, 10 40), (16 17, 31 32, 24 25, 16 17)), ((50 40, 50 40, 50 40, 50 40, 50 40)))" |>
st_as_sfc() |>
st_make_valid(geos_keep_collapsed = FALSE) |>
st_as_text()
# [1] "POLYGON ((10 40, 40 40, 40 10, 10 10, 10 40))"
Two questions:
- is adopting the new strategy (valid_structure & keep_collapsed) is a good default? Or should the old strategy be a better default, so we don't get any revdep surprises now?
- in the GEOS docs I see for setKeepCollapsed: "When this parameter is not set to 0, the GEOS_MAKE_VALID_STRUCTURE method will drop any component that has collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point." Should drop be keep? If that is not the case, then I don't understand the output I get.
- in the GEOS docs I see for setKeepCollapsed: "When this parameter is not set to 0, the GEOS_MAKE_VALID_STRUCTURE method will drop any component that has collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point." Should drop be keep? If that is not the case, then I don't understand the output I get.
Yes, that should be "keep". I'll make the change.