sf
sf copied to clipboard
sf 1.0 will switch to using S2 geometry for ellipsoidal coordinates
As announced here in June last year, and also in the NEWS file around the same time, it is planned that where possible sf
will use the spherical geometry operators of package s2
instead of the GEOS
routines that assume projected coordinates. I've now run reverse dependency checks and found regressions in the following packages ("check" means: pkg maintainer informed):
- [x] csodata
- [x] eSDM
- [x] geomultistar
- [x] jpmesh
- [x] mapping
- [x] mapsf
- [x] MODIS
- [x] nhdplusTools
- [x] nhdR
- [x] osmextract
- [x] redist
- [x] rerddapXtracto (false positive: url timeout?)
- [x] rmapzen
- [x] sen2r
- [x] sfnetworks
- [x] slga
- [x] tmap
- [x] tmaptools
I have opened issues / written emails to all packages affected (issues mentioned below).
Testing your package while sf
uses s2
can be done by setting
sf::sf_use_s2(TRUE)
or by specifying the environment variable _SF_USE_S2
to the value true
(case sensitive); the latter can be done from within R with
Sys.setenv("_SF_USE_S2", "true")
but that needs to be done before sf
is loaded.
Dear package author,
while checking reverse dependency's against the sf 1.0 candidate which uses s2 as a geometry back-end when using ellipsoidal coordinates, we found a regression. Please verify that this is not a false positive, and update your package on CRAN in a timely fashion before sf
is released (which is not earlier than May 1st).
See https://github.com/r-spatial/sf/issues/1649 for details.
For those interested in a low-res world map valid on S2, created by @paleolimbot :
library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
st_as_sf(s2_data_tbl_countries)
# Simple feature collection with 177 features and 2 fields
# Geometry type: MULTIPOLYGON
# Dimension: XY
# Bounding box: xmin: -180 ymin: -85.60904 xmax: 180 ymax: 83.64513
# CRS: NA
# First 10 features:
# name continent
# 1 Afghanistan Asia
# 2 Angola Africa
# 3 Albania Europe
# 4 United Arab Emirates Asia
# 5 Argentina South America
# 6 Armenia Asia
# 7 Antarctica Antarctica
# 8 French Southern and Antarctic Lands Seven seas (open ocean)
# 9 Australia Oceania
# 10 Austria Europe
# geometry
# 1 MULTIPOLYGON (((61.21082 35...
# 2 MULTIPOLYGON (((16.32653 -5...
# 3 MULTIPOLYGON (((20.59025 41...
# 4 MULTIPOLYGON (((51.57952 24...
# 5 MULTIPOLYGON (((-65.5 -55.2...
# 6 MULTIPOLYGON (((43.58275 41...
# 7 MULTIPOLYGON (((-59.57209 -...
# 8 MULTIPOLYGON (((68.935 -48....
# 9 MULTIPOLYGON (((145.398 -40...
# 10 MULTIPOLYGON (((16.97967 48...
For those getting errors involving redundant, invalid, or crossing edges, it is likely because s2 assumes the shortest distance on the sphere between two points (not along a constant latitude or longitude as would appear in a standard cartesian plot with points connected). A few things worth trying are:
-
st_segmentize()
in cartesian space before converting to s2. I don't know if there's an easy way to do this withst_use_s2(TRUE)
...maybe there should be? - Use
s2_rebuild()
, which is likest_make_valid()
but for the sphere. I wonder itst_as_s2()
should have arebuild
argument because right now it's a little hard to access (right now I think you need to dost_as_sf(s2_rebuild(s2_geog_from_wkb(st_as_binary(x), check = FALSE))))
) - Use
s2_snap_to_grid()
(under the hood this us justs2_rebuild()
with a snap function). It would probably be helpful if we made this easy to do as well.
I'm certainly happy to PR some of this in (but @edzer should decide how/if it's done since he's spent more time in sf!)
We can't call s2_rebuild
when the error already happens in s2_geog_from_wkb
:
library(tmap)
library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
data(World)
x = st_transform(World, 4326)
# qtm(x) # -->> fails
w = s2_geog_from_wkb(st_as_binary(st_geometry(x)))
# Error in s2_geography_from_wkb(wkb_bytes, oriented = oriented, check = check) :
# Loop 7 is not valid: Edge 171 crosses edge 174
# Calls: s2_geog_from_wkb -> new_s2_xptr -> s2_geography_from_wkb
# Execution halted
And the following also didn't help:
w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(10, km))))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(1, km))))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(st_geometry(x), units::set_units(.1, km))))
g = st_geometry(x)
st_crs(g) = NA # so we segmentize in "degrees"
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .01)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .1)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .001)))
w = s2_geog_from_wkb(st_as_binary(st_segmentize(g, .0001)))
I think the key is check = FALSE
:
library(tmap)
#> Warning: package 'tmap' was built under R version 4.0.3
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
#> Warning: package 's2' was built under R version 4.0.5
data(World)
x = st_transform(World, 4326)
# qtm(x) # -->> fails
w = s2_geog_from_wkb(st_as_binary(st_geometry(x)), check = FALSE)
library(s2plot) # remotes::install_github("paleolimbot/s2plot")
s2plot(s2_rebuild(w))
Created on 2021-04-19 by the reprex package (v2.0.0)
Bingo! This makes tmap
and tmaptools
pass check, and possibly a number of others. I will run an entire revdep check with this change, and report back. Out of interest: is s2_rebuild
an expensive operation?
I think so - it's something like a Union. I think it might be worth exposing check
as an argument to st_as_s2()
so that when a user gets this error they have recourse (rather than turning this on by default).
As a note, when I was testing with BigQuery, I did notice that they do this operation when taking any foreign input (wkt, wkb). I think they also round coordinates on input to snap level 30.
I think it might be worth exposing check as an argument to st_as_s2() so that when a user gets this error they have recourse (rather than turning this on by default).
That might be a bit hard: you'd have to pass it e.g. in every st_intersects(a, b)
, as that converts to s2, calls s2's intersection, and returns.
Are there any risks on the way back: S2 geometries converted to wkt, then to sf, then e.g. projected to whatever?
just to confirm – 6de3420 fixes all errors with polygons from rnaturalearth::ne_countries
, across any types and scales, both CRAN and github versions. I wrote a little test to identify which polygons were failing, but you got ahead of me ;)
The "risk" is mostly that it's slow and shouldn't have to be. I think that any operation that returns a gemoetry (intersection, etc.) will probably be unaffected but I forget the details and I'm worried that it could do things like change the direction of a line (I guess that a GEOS overlay operation might do this too though).
The overlay ops might be much faster than GEOS though so users might not notice?
Two ways to provide an escape hatch here without running all input through s2_rebuild()
would be
- Use the
st_precision()
of a sfc to decide (i.e., if a precision is defined, export to WKB without rounding, read into s2 withcheck = FALSE
, and calls2_snap_to_grid()
, which is rebuild with some defaults). Then if a user gets errors they can do e.g.st_union(st_set_precision(x))
. - Use
s2_rebuild()
inst_make_valid()
, whose input is always passed to s2 withcheck = FALSE
I think these are along the same lines as how users currently have to handle invalid input. In the meantime s2 should provide an equivalent to st_is_valid()
so that users like @jthurner don't have write their own tests for which polygons are invalid!
Re: st_make_valid()
- the underlying GEOS is being looked at in relation to a new planar geometry fixer: https://lists.osgeo.org/pipermail/geos-devel/2021-April/010205.html.
FWIW, here are the failing polygons from the most recent rnaturalearth
package versions. None of the approaches I had seen before today were able to make them convert (code).
#> Simple feature collection with 6 features and 6 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -180 ymin: 8.229188 xmax: 180 ymax: 81.2504
#> CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
#> dataset name st_valid fixed_prec fixed_prep_script fixed_lwgeom
#> 1 ne_countries_110 Sudan TRUE FALSE FALSE FALSE
#> 2 ne_countries_110 Russia TRUE FALSE TRUE FALSE
#> 3 ne_map_units_110 Sudan TRUE FALSE FALSE FALSE
#> 4 ne_map_units_110 Russia TRUE FALSE TRUE FALSE
#> 5 ne_sovereignty_110 Sudan TRUE FALSE FALSE FALSE
#> 6 ne_sovereignty_110 Russia TRUE FALSE TRUE FALSE
For rnaturalearth
on CRAN, there are a lot more errors, in both low and medium scale. Some go away with st_set_precision(1e7)
but the majority still fails with the methods tried.
- [x] Use the
st_precision()
of a sfc to decide (i.e., if a precision is defined, export to WKB without rounding, read into s2 withcheck = FALSE
, and calls2_snap_to_grid()
, which is rebuild with some defaults). Then if a user gets errors they can do e.g.st_union(st_set_precision(x))
. - [x] Use
s2_rebuild()
inst_make_valid()
, whose input is always passed to s2 withcheck = FALSE
Thanks for the message!
Question: In sfnetworks we have the sfnetwork
class. It extends igraph and is basically a collection of two separate sf
objects (one for nodes and one for edges). Since we have an sf::st_geometry()
method for this class, all sf functions (also those which are not generics) that wrap a function from the geometry back-end (in this case GEOS) work "out of the box", since they forward only the geometry to the underlying C++ code. E.g.: https://github.com/r-spatial/sf/blob/6de34206a19b3f33b939955bc4375de4756d5e83/R/nearest.R#L122
Personally I think this is quite a nice concept that also makes sense. All sf functions that only consider geometry - and not attributes - will always work if you implement a valid st_geometry method for your class, no matter if the sf function is a generic, or if your class inherits the sf class. But maybe - or probably ;) - sfnetworks is the only package using sf in this way.
After the switch to s2
, having an st_geometry method is not enough anymore to make the core geometry functions work, since the whole sf object is forwarded to the underlying s2 function, instead of only the geometry. E.g.: https://github.com/r-spatial/sf/blob/6de34206a19b3f33b939955bc4375de4756d5e83/R/nearest.R#L118
As far as I can see and understand, these sf objects are converted to s2 geography vectors with the default method of s2::s2_as_geography()
. This method then calls the default method of wk::as_wkb()
, and after some steps it ends up calling the sf method of wk::wk_handle()
, which then extracts only the geometry of the sf object (at this moments it breaks for sfnetwork objects, since there is no sfnetwork method for wk::wk_handle()
). Am I right that s2 only uses the geometry from the sf objects, and simply ignores the attributes? If yes: would it be an idea to - in line with how the GEOS code currently is called - already extract the geometry of the sf objects at the moment that sf calls the underlying s2 function. I.e.: s2::s2_closest_feature(st_geometry(x), st_geometry(y))
.
In any other case, I am interested to hear what your (@edzer or @paleolimbot) "best practice" advice would be for sfnetworks. Two options I see: add a sfnetwork specific method for s2::s2_as_geography()
to sfnetworks, or add an sfnetwork specific method for wk::wk_handle()
to sfnetworks. Both seem easy to implement and would fix all revdep errors with sfnetworks (but I guess we do have to add either s2
or wk
as dependency).
I can't speak to how the core sf methods are set up, but I can say that the use of wk::as_wkb()
and wk::wk_handle()
are still experimental and I would avoid them until I have established a stable pattern. You can, however, implement as_s2_geography()
if this helps you (as_s2_geography()
is defined for sf and sfc objects already).
If you are calling s2 methods repeatedly you may be able to save time by converting once on the way in (but make sure this is actually much faster because it probably adds a lot of complexity that sf has spent a lot of code getting right).
Hi @luukvdmeer ! I don't think you have to change anything of this kind, sf
should do all the conversion to and from s2
, no need to change your package.
Fun fact: https://github.com/r-spatial/sf/commit/6de34206a19b3f33b939955bc4375de4756d5e83 would solve all issues above. Too bad we can't go with it...
Trying the path trough getting s2_rebuild
inside st_make_valid
, I stumble upon this:
library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>%
st_geometry() %>%
st_as_binary() %>%
as_s2_geography(check = FALSE) %>%
s2_rebuild() %>%
s2_as_binary()
s2 %>% as_s2_geography(check = TRUE)
# Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
# Loop 7 is not valid: Edge 171 crosses edge 174
# Calls: %>% ... as_s2_geography.blob -> new_s2_xptr -> s2_geography_from_wkb
do you have an idea, @paleolimbot ?
Are we sure check = FALSE
is getting passed on to s2_geog_from_wkb()
?
As far as I can tell, yes.
@edzer ~~in your example above, the s2_geography object returned by s2_rebuild()
is valid – but it fails when converting to binary again at the end.~~
EDIT: Oh well, the return of s2_rebuild() of course passes as_s2_geography()
Interestingly, it stops at the exact same spot as before the whole transformation:
World %>% as_s2_geography(check = TRUE)
#> Error in s2_geography_from_wkb(x, oriented = oriented, check = check): Loop 7 is not valid: Edge 171 crosses edge 174
Perhaps try s2_rebuild(options = s2_options(split_crossing_edges = TRUE))
?
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.2, PROJ 7.1.0
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>%
st_geometry() %>%
st_as_binary() %>%
as_s2_geography(check = FALSE) %>%
s2_rebuild(options = s2_options(split_crossing_edges = TRUE)) %>%
s2_as_binary()
Created on 2021-04-19 by the reprex package (v0.3.0)
I should have checked more carefully...that will also result in an invalid ring if you poke it any more. This one works with s2_snap_to_grid()
:
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.2, PROJ 7.1.0
library(s2)
data(World, package = "tmap")
s2 = st_transform(World, 4326) %>%
st_geometry() %>%
st_as_binary() %>%
as_s2_geography(check = FALSE) %>%
s2_snap_to_grid(1e-6) %>%
s2_as_binary() %>%
s2_geog_from_wkb(check = TRUE)
Created on 2021-04-19 by the reprex package (v0.3.0)
Now, st_make_valid
makes unprojected geometries valid on S2, using s2::s2_rebuild
, using a precision of 1e7 if no precision has been set, and when the argument s2_options
has not been specified.
I will start another round of reverse depency checks, and look at the diffs. For tmap
and tmaptools
for instance, @mtennekes, this implies that st_make_valid
must be called on World
after it has been unprojected.
Thanks @edzer for the message at https://github.com/ranghetti/sen2r/issues/396.
It seems that the invalid geometry whose the error message refers is a consequence of a different output obtained using st_union()
versions 0.9.8 (CRAN) and 1.0.0 (branch sf_1_0).
Here below a reproducible example:
m0 <- matrix(c(0,0,0,1,1,1,1,0,0,0), nrow = 2)
p1 <- st_sfc(
st_polygon(list(t(m0))),
st_polygon(list(t(m0 + c(0, .9)))),
st_polygon(list(t(m0 + c(.9, 0)))),
st_polygon(list(t(m0 + .9))),
crs = 4326
)
plot(p1, col = "#88888844")
Using {sf} from CRAN, st_union()
merges the four squares (which is what I am expecting):
packageVersion("sf")
# [1] ‘0.9.8’
plot(st_union(p1), col = "#88888844")
Instead, using it from sf_1_0, st_union()
maintains only the "not-overlapped" portions of the polygons, creating an invalid geometry:
packageVersion("sf")
# [1] ‘1.0.0’
plot(st_union(p1), col = "#88888844")
st_is_valid(st_union(p1))
# [1] FALSE
The error message is obtained performing a difference:
p2 <- st_sfc(
st_polygon(list(t(m0 + .45))),
crs = 4326
)
st_difference(st_union(p1), p2)
# Error in s2_geography_from_wkb(x, oriented = oriented, check = check) :
# Loop 0 is not valid: Edge 7 is degenerate (duplicate vertex)
I see this behaviour only using geographical coordinates:
packageVersion("sf")
# [1] ‘1.0.0’
p3 <- st_transform(p1, 32631)
p4 <- st_sfc(
st_polygon(list(t(m0) * 1E5)),
st_polygon(list(t(m0 + c(0, .9)) * 1E5)),
st_polygon(list(t(m0 + c(.9, 0)) * 1E5)),
st_polygon(list(t(m0 + .9) * 1E5)),
crs = 32631
)
plot(st_union(p3), col = "#88888844")
plot(st_union(p4), col = "#88888844")
Thanks, @ranghetti - very helpful! In sf
I now fixed this with a temporary (=slow) fix until we get this properly sorted in s2
.
I will start another round of reverse depency checks, and look at the diffs. For
tmap
andtmaptools
for instance, @mtennekes, this implies thatst_make_valid
must be called onWorld
after it has been unprojected.
So you mean that st_make_valid
doesn't work anymore for projected sf
objects? Any other implications for tmap(tools)
? (I still have to catch up with s2
.)
it works for both, but differently: see e.g. https://keen-swartz-3146c4.netlify.app/spherical.html#validity-on-the-sphere
After unprojecting World
, it is interpreted on S2 and no longer valid, and needs to be pulled through st_make_valid()
.