sf icon indicating copy to clipboard operation
sf copied to clipboard

sf 1.0 will switch to using S2 geometry for ellipsoidal coordinates

Open edzer opened this issue 3 years ago • 66 comments

As announced here in June last year, and also in the NEWS file around the same time, it is planned that where possible sf will use the spherical geometry operators of package s2 instead of the GEOS routines that assume projected coordinates. I've now run reverse dependency checks and found regressions in the following packages ("check" means: pkg maintainer informed):

  • [x] csodata
  • [x] eSDM
  • [x] geomultistar
  • [x] jpmesh
  • [x] mapping
  • [x] mapsf
  • [x] MODIS
  • [x] nhdplusTools
  • [x] nhdR
  • [x] osmextract
  • [x] redist
  • [x] rerddapXtracto (false positive: url timeout?)
  • [x] rmapzen
  • [x] sen2r
  • [x] sfnetworks
  • [x] slga
  • [x] tmap
  • [x] tmaptools

I have opened issues / written emails to all packages affected (issues mentioned below).

Testing your package while sf uses s2 can be done by setting

sf::sf_use_s2(TRUE)

or by specifying the environment variable _SF_USE_S2 to the value true (case sensitive); the latter can be done from within R with

Sys.setenv("_SF_USE_S2", "true")

but that needs to be done before sf is loaded.

edzer avatar Apr 17 '21 19:04 edzer

Dear package author,

while checking reverse dependency's against the sf 1.0 candidate which uses s2 as a geometry back-end when using ellipsoidal coordinates, we found a regression. Please verify that this is not a false positive, and update your package on CRAN in a timely fashion before sf is released (which is not earlier than May 1st).

See https://github.com/r-spatial/sf/issues/1649 for details.

edzer avatar Apr 17 '21 19:04 edzer

For those interested in a low-res world map valid on S2, created by @paleolimbot :

library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
st_as_sf(s2_data_tbl_countries)
# Simple feature collection with 177 features and 2 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: -180 ymin: -85.60904 xmax: 180 ymax: 83.64513
# CRS:           NA
# First 10 features:
#                                   name               continent
# 1                          Afghanistan                    Asia
# 2                               Angola                  Africa
# 3                              Albania                  Europe
# 4                 United Arab Emirates                    Asia
# 5                            Argentina           South America
# 6                              Armenia                    Asia
# 7                           Antarctica              Antarctica
# 8  French Southern and Antarctic Lands Seven seas (open ocean)
# 9                            Australia                 Oceania
# 10                             Austria                  Europe
#                          geometry
# 1  MULTIPOLYGON (((61.21082 35...
# 2  MULTIPOLYGON (((16.32653 -5...
# 3  MULTIPOLYGON (((20.59025 41...
# 4  MULTIPOLYGON (((51.57952 24...
# 5  MULTIPOLYGON (((-65.5 -55.2...
# 6  MULTIPOLYGON (((43.58275 41...
# 7  MULTIPOLYGON (((-59.57209 -...
# 8  MULTIPOLYGON (((68.935 -48....
# 9  MULTIPOLYGON (((145.398 -40...
# 10 MULTIPOLYGON (((16.97967 48...

edzer avatar Apr 17 '21 20:04 edzer

For those getting errors involving redundant, invalid, or crossing edges, it is likely because s2 assumes the shortest distance on the sphere between two points (not along a constant latitude or longitude as would appear in a standard cartesian plot with points connected). A few things worth trying are:

  • st_segmentize() in cartesian space before converting to s2. I don't know if there's an easy way to do this with st_use_s2(TRUE)...maybe there should be?
  • Use s2_rebuild(), which is like st_make_valid() but for the sphere. I wonder it st_as_s2() should have a rebuild argument because right now it's a little hard to access (right now I think you need to do st_as_sf(s2_rebuild(s2_geog_from_wkb(st_as_binary(x), check = FALSE)))))
  • Use s2_snap_to_grid() (under the hood this us just s2_rebuild() with a snap function). It would probably be helpful if we made this easy to do as well.

I'm certainly happy to PR some of this in (but @edzer should decide how/if it's done since he's spent more time in sf!)

paleolimbot avatar Apr 17 '21 22:04 paleolimbot

We can't call s2_rebuild when the error already happens in s2_geog_from_wkb:

library(tmap)
library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
data(World)
x = st_transform(World, 4326)
# qtm(x) # -->> fails
w = s2_geog_from_wkb(st_as_binary(st_geometry(x)))
# Error in s2_geography_from_wkb(wkb_bytes, oriented = oriented, check = check) : 
#   Loop 7 is not valid: Edge 171 crosses edge 174
# Calls: s2_geog_from_wkb -> new_s2_xptr -> s2_geography_from_wkb
# Execution halted

edzer avatar Apr 19 '21 08:04 edzer

And the following also didn't help:

w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(10, km))))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(1, km))))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(.1, km))))
g = st_geometry(x)
st_crs(g) = NA # so we segmentize in "degrees"
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .01)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .1)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .001)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .0001)))

edzer avatar Apr 19 '21 08:04 edzer

I think the key is check = FALSE:

library(tmap)
#> Warning: package 'tmap' was built under R version 4.0.3
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
#> Warning: package 's2' was built under R version 4.0.5
data(World)
x = st_transform(World, 4326)
# qtm(x) # -->> fails
w = s2_geog_from_wkb(st_as_binary(st_geometry(x)), check = FALSE)

library(s2plot) # remotes::install_github("paleolimbot/s2plot")
s2plot(s2_rebuild(w))

Created on 2021-04-19 by the reprex package (v2.0.0)

paleolimbot avatar Apr 19 '21 11:04 paleolimbot

Bingo! This makes tmap and tmaptools pass check, and possibly a number of others. I will run an entire revdep check with this change, and report back. Out of interest: is s2_rebuild an expensive operation?

edzer avatar Apr 19 '21 11:04 edzer

I think so - it's something like a Union. I think it might be worth exposing check as an argument to st_as_s2() so that when a user gets this error they have recourse (rather than turning this on by default).

As a note, when I was testing with BigQuery, I did notice that they do this operation when taking any foreign input (wkt, wkb). I think they also round coordinates on input to snap level 30.

paleolimbot avatar Apr 19 '21 12:04 paleolimbot

I think it might be worth exposing check as an argument to st_as_s2() so that when a user gets this error they have recourse (rather than turning this on by default).

That might be a bit hard: you'd have to pass it e.g. in every st_intersects(a, b), as that converts to s2, calls s2's intersection, and returns.

Are there any risks on the way back: S2 geometries converted to wkt, then to sf, then e.g. projected to whatever?

edzer avatar Apr 19 '21 12:04 edzer

just to confirm – 6de3420 fixes all errors with polygons from rnaturalearth::ne_countries, across any types and scales, both CRAN and github versions. I wrote a little test to identify which polygons were failing, but you got ahead of me ;)

jthurner avatar Apr 19 '21 12:04 jthurner

The "risk" is mostly that it's slow and shouldn't have to be. I think that any operation that returns a gemoetry (intersection, etc.) will probably be unaffected but I forget the details and I'm worried that it could do things like change the direction of a line (I guess that a GEOS overlay operation might do this too though).

The overlay ops might be much faster than GEOS though so users might not notice?

Two ways to provide an escape hatch here without running all input through s2_rebuild() would be

  • Use the st_precision() of a sfc to decide (i.e., if a precision is defined, export to WKB without rounding, read into s2 with check = FALSE, and call s2_snap_to_grid(), which is rebuild with some defaults). Then if a user gets errors they can do e.g. st_union(st_set_precision(x)).
  • Use s2_rebuild() in st_make_valid(), whose input is always passed to s2 with check = FALSE

I think these are along the same lines as how users currently have to handle invalid input. In the meantime s2 should provide an equivalent to st_is_valid() so that users like @jthurner don't have write their own tests for which polygons are invalid!

paleolimbot avatar Apr 19 '21 12:04 paleolimbot

Re: st_make_valid() - the underlying GEOS is being looked at in relation to a new planar geometry fixer: https://lists.osgeo.org/pipermail/geos-devel/2021-April/010205.html.

rsbivand avatar Apr 19 '21 12:04 rsbivand

FWIW, here are the failing polygons from the most recent rnaturalearth package versions. None of the approaches I had seen before today were able to make them convert (code).

#> Simple feature collection with 6 features and 6 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -180 ymin: 8.229188 xmax: 180 ymax: 81.2504
#> CRS:           +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
#>              dataset   name st_valid fixed_prec fixed_prep_script fixed_lwgeom
#> 1   ne_countries_110  Sudan     TRUE      FALSE             FALSE        FALSE
#> 2   ne_countries_110 Russia     TRUE      FALSE              TRUE        FALSE
#> 3   ne_map_units_110  Sudan     TRUE      FALSE             FALSE        FALSE
#> 4   ne_map_units_110 Russia     TRUE      FALSE              TRUE        FALSE
#> 5 ne_sovereignty_110  Sudan     TRUE      FALSE             FALSE        FALSE
#> 6 ne_sovereignty_110 Russia     TRUE      FALSE              TRUE        FALSE

For rnaturalearth on CRAN, there are a lot more errors, in both low and medium scale. Some go away with st_set_precision(1e7) but the majority still fails with the methods tried.

jthurner avatar Apr 19 '21 13:04 jthurner

  • [x] Use the st_precision() of a sfc to decide (i.e., if a precision is defined, export to WKB without rounding, read into s2 with check = FALSE, and call s2_snap_to_grid(), which is rebuild with some defaults). Then if a user gets errors they can do e.g. st_union(st_set_precision(x)).
  • [x] Use s2_rebuild() in st_make_valid(), whose input is always passed to s2 with check = FALSE

edzer avatar Apr 19 '21 14:04 edzer

Thanks for the message!

Question: In sfnetworks we have the sfnetwork class. It extends igraph and is basically a collection of two separate sf objects (one for nodes and one for edges). Since we have an sf::st_geometry() method for this class, all sf functions (also those which are not generics) that wrap a function from the geometry back-end (in this case GEOS) work "out of the box", since they forward only the geometry to the underlying C++ code. E.g.: https://github.com/r-spatial/sf/blob/6de34206a19b3f33b939955bc4375de4756d5e83/R/nearest.R#L122

Personally I think this is quite a nice concept that also makes sense. All sf functions that only consider geometry - and not attributes - will always work if you implement a valid st_geometry method for your class, no matter if the sf function is a generic, or if your class inherits the sf class. But maybe - or probably ;) - sfnetworks is the only package using sf in this way.

After the switch to s2, having an st_geometry method is not enough anymore to make the core geometry functions work, since the whole sf object is forwarded to the underlying s2 function, instead of only the geometry. E.g.: https://github.com/r-spatial/sf/blob/6de34206a19b3f33b939955bc4375de4756d5e83/R/nearest.R#L118

As far as I can see and understand, these sf objects are converted to s2 geography vectors with the default method of s2::s2_as_geography(). This method then calls the default method of wk::as_wkb(), and after some steps it ends up calling the sf method of wk::wk_handle(), which then extracts only the geometry of the sf object (at this moments it breaks for sfnetwork objects, since there is no sfnetwork method for wk::wk_handle()). Am I right that s2 only uses the geometry from the sf objects, and simply ignores the attributes? If yes: would it be an idea to - in line with how the GEOS code currently is called - already extract the geometry of the sf objects at the moment that sf calls the underlying s2 function. I.e.: s2::s2_closest_feature(st_geometry(x), st_geometry(y)).

In any other case, I am interested to hear what your (@edzer or @paleolimbot) "best practice" advice would be for sfnetworks. Two options I see: add a sfnetwork specific method for s2::s2_as_geography() to sfnetworks, or add an sfnetwork specific method for wk::wk_handle() to sfnetworks. Both seem easy to implement and would fix all revdep errors with sfnetworks (but I guess we do have to add either s2 or wk as dependency).

luukvdmeer avatar Apr 19 '21 16:04 luukvdmeer

I can't speak to how the core sf methods are set up, but I can say that the use of wk::as_wkb() and wk::wk_handle() are still experimental and I would avoid them until I have established a stable pattern. You can, however, implement as_s2_geography() if this helps you (as_s2_geography() is defined for sf and sfc objects already).

If you are calling s2 methods repeatedly you may be able to save time by converting once on the way in (but make sure this is actually much faster because it probably adds a lot of complexity that sf has spent a lot of code getting right).

paleolimbot avatar Apr 19 '21 16:04 paleolimbot

Hi @luukvdmeer ! I don't think you have to change anything of this kind, sf should do all the conversion to and from s2, no need to change your package.

edzer avatar Apr 19 '21 19:04 edzer

Fun fact: https://github.com/r-spatial/sf/commit/6de34206a19b3f33b939955bc4375de4756d5e83 would solve all issues above. Too bad we can't go with it...

edzer avatar Apr 19 '21 19:04 edzer

Trying the path trough getting s2_rebuild inside st_make_valid, I stumble upon this:

library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>% 
	st_geometry() %>%
	st_as_binary() %>%
	as_s2_geography(check = FALSE) %>%
	s2_rebuild() %>%
	s2_as_binary()

s2 %>% as_s2_geography(check = TRUE)
# Error in s2_geography_from_wkb(x, oriented = oriented, check = check) : 
#   Loop 7 is not valid: Edge 171 crosses edge 174
# Calls: %>% ... as_s2_geography.blob -> new_s2_xptr -> s2_geography_from_wkb

do you have an idea, @paleolimbot ?

edzer avatar Apr 19 '21 19:04 edzer

Are we sure check = FALSE is getting passed on to s2_geog_from_wkb()?

paleolimbot avatar Apr 19 '21 19:04 paleolimbot

As far as I can tell, yes.

edzer avatar Apr 19 '21 19:04 edzer

@edzer ~~in your example above, the s2_geography object returned by s2_rebuild() is valid – but it fails when converting to binary again at the end.~~ EDIT: Oh well, the return of s2_rebuild() of course passes as_s2_geography()

Interestingly, it stops at the exact same spot as before the whole transformation:

World %>% as_s2_geography(check = TRUE)
#> Error in s2_geography_from_wkb(x, oriented = oriented, check = check): Loop 7 is not valid: Edge 171 crosses edge 174

jthurner avatar Apr 19 '21 21:04 jthurner

Perhaps try s2_rebuild(options = s2_options(split_crossing_edges = TRUE))?

library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.2, PROJ 7.1.0
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>% 
  st_geometry() %>%
  st_as_binary() %>%
  as_s2_geography(check = FALSE) %>%
  s2_rebuild(options = s2_options(split_crossing_edges = TRUE)) %>% 
  s2_as_binary()

Created on 2021-04-19 by the reprex package (v0.3.0)

paleolimbot avatar Apr 19 '21 21:04 paleolimbot

I should have checked more carefully...that will also result in an invalid ring if you poke it any more. This one works with s2_snap_to_grid():

library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.2, PROJ 7.1.0
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>% 
  st_geometry() %>%
  st_as_binary() %>%
  as_s2_geography(check = FALSE) %>%
  s2_snap_to_grid(1e-6) %>% 
  s2_as_binary() %>% 
  s2_geog_from_wkb(check = TRUE)

Created on 2021-04-19 by the reprex package (v0.3.0)

paleolimbot avatar Apr 20 '21 00:04 paleolimbot

Now, st_make_valid makes unprojected geometries valid on S2, using s2::s2_rebuild, using a precision of 1e7 if no precision has been set, and when the argument s2_options has not been specified.

edzer avatar Apr 20 '21 12:04 edzer

I will start another round of reverse depency checks, and look at the diffs. For tmap and tmaptools for instance, @mtennekes, this implies that st_make_valid must be called on World after it has been unprojected.

edzer avatar Apr 20 '21 12:04 edzer

Thanks @edzer for the message at https://github.com/ranghetti/sen2r/issues/396. It seems that the invalid geometry whose the error message refers is a consequence of a different output obtained using st_union() versions 0.9.8 (CRAN) and 1.0.0 (branch sf_1_0).

Here below a reproducible example:

m0 <- matrix(c(0,0,0,1,1,1,1,0,0,0), nrow = 2)
p1 <- st_sfc(
  st_polygon(list(t(m0))),
  st_polygon(list(t(m0 + c(0, .9)))),
  st_polygon(list(t(m0 + c(.9, 0)))),
  st_polygon(list(t(m0 + .9))),
  crs = 4326
)
plot(p1, col = "#88888844")

image

Using {sf} from CRAN, st_union() merges the four squares (which is what I am expecting):

packageVersion("sf")
# [1] ‘0.9.8’
plot(st_union(p1), col = "#88888844")

image

Instead, using it from sf_1_0, st_union() maintains only the "not-overlapped" portions of the polygons, creating an invalid geometry:

packageVersion("sf")
# [1] ‘1.0.0’
plot(st_union(p1), col = "#88888844")

image

st_is_valid(st_union(p1))
# [1] FALSE

The error message is obtained performing a difference:

p2 <- st_sfc(
  st_polygon(list(t(m0 + .45))),
  crs = 4326
)
st_difference(st_union(p1), p2)
# Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
#    Loop 0 is not valid: Edge 7 is degenerate (duplicate vertex)

I see this behaviour only using geographical coordinates:

packageVersion("sf")
# [1] ‘1.0.0’
p3 <- st_transform(p1, 32631)
p4 <- st_sfc(
  st_polygon(list(t(m0) * 1E5)),
  st_polygon(list(t(m0 + c(0, .9)) * 1E5)),
  st_polygon(list(t(m0 + c(.9, 0)) * 1E5)),
  st_polygon(list(t(m0 + .9) * 1E5)),
  crs = 32631
)
plot(st_union(p3), col = "#88888844")

image

plot(st_union(p4), col = "#88888844")

image

ranghetti avatar Apr 20 '21 15:04 ranghetti

Thanks, @ranghetti - very helpful! In sf I now fixed this with a temporary (=slow) fix until we get this properly sorted in s2.

edzer avatar Apr 21 '21 12:04 edzer

I will start another round of reverse depency checks, and look at the diffs. For tmap and tmaptools for instance, @mtennekes, this implies that st_make_valid must be called on World after it has been unprojected.

So you mean that st_make_valid doesn't work anymore for projected sf objects? Any other implications for tmap(tools)? (I still have to catch up with s2.)

mtennekes avatar Apr 21 '21 13:04 mtennekes

it works for both, but differently: see e.g. https://keen-swartz-3146c4.netlify.app/spherical.html#validity-on-the-sphere

After unprojecting World, it is interpreted on S2 and no longer valid, and needs to be pulled through st_make_valid().

edzer avatar Apr 21 '21 13:04 edzer