sf icon indicating copy to clipboard operation
sf copied to clipboard

st_buffer doesn't work properly for lat-long coordinates and small buffers

Open barryrowlingson opened this issue 3 years ago • 7 comments

example: make a one-degree unit square, in epsg 4326 and equirectangular

> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
> m = cbind(c(0,1,1,0,0),c(0,0,1,1,0))
> p1 = st_sfc(st_polygon(list(m)), crs=4326)
> p1e = st_transform(p1, "+proj=eqc")

Now buffer the equirectangular by a generous chunk of its width and plot the results:

> p1eb = st_buffer(p1e, dist=8000)
> plot(p1eb)
> axis(1)
> plot(p1e, add=TRUE)

image

Looks good. Now try with the lat-long square, using a 0.1 buffer distance. I first thought this was meant to be degrees which is why I tried this. I'm sure this is metres now. Anyway it gets me this:

> p1b = st_buffer(p1, dist=0.1)
> plot(p1b)
> axis(1)

image

It seems to have extended slightly to the S and W except for a notch in the SW corner (at Null Island...)

Creating a zero-distance buffer shows a bit more glitchyness:

> p1b = st_buffer(p1, dist=0)
> plot(p1b, lwd=3, border="blue")
> plot(p1,border="red", add=TRUE)

image

Even large buffers (which I guess are in metres?) produce glitchy results in places:

> p1b10k = st_buffer(p1, dist=10000)
> plot(p1b10k, lwd=3)

the corners are quite steppy all round, and there's two nicks taken out of the northern border. I hope this doesn't start a war:

image

Buffering the equirectangular square by 10km and overlaying with the buffered 4326 square (transformed to equirectangular) shows the steppiness of the 4326 buffer compared to the smooth equirectangular buffer:

> p1e10k = st_buffer(p1e, dist=10000)
> plot(st_transform(p1b10k,st_crs(p1e10k)), lwd=1,add=TRUE,border="red")
> axis(1)

image

I'd hazard a guess that this is some tolerance when buffering lat-long coords, and possibly related to the s2 spherical geometry changes? I'd also hazard a guess that you've seen this and already fixed it :)

Session Info:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         class_7.3-17       crayon_1.3.4       dplyr_1.0.2       
 [5] wk_0.4.1           grid_4.0.3         R6_2.5.0           lifecycle_1.0.0   
 [9] DBI_1.1.0          magrittr_2.0.1     e1071_1.7-4        units_0.6-7       
[13] pillar_1.4.7       KernSmooth_2.23-17 rlang_0.4.10       ellipsis_0.3.1    
[17] vctrs_0.3.6        generics_0.1.0     s2_1.0.5           tools_4.0.3       
[21] glue_1.4.2         purrr_0.3.4        compiler_4.0.3     pkgconfig_2.0.3   
[25] classInt_0.4-3     tidyselect_1.1.0   tibble_3.0.4      

barryrowlingson avatar Jun 10 '21 10:06 barryrowlingson

No, it's not fixed; see https://r-spatial.github.io/sf/articles/sf7.html#buffers-1 for a description.

Your example is on the equator, but with realistic examples anything further away will have direction-dependent buffers when treating ellipsoidal coordinates as Cartesian (the GEOS way). So it's not good what we have now, but it's not good what we had either.

edzer avatar Jun 10 '21 11:06 edzer

Here is what I mean:

library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

(pt = st_sfc(st_point(c(7, 52)), crs = 'OGC:CRS84'))
# Geometry set for 1 feature 
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 7 ymin: 52 xmax: 7 ymax: 52
# Geodetic CRS:  WGS 84
# POINT (7 52)

sf_use_s2(TRUE)
plot(st_buffer(pt, units::set_units(1, degree)), axes = TRUE, main = "s2")
plot(pt, add = TRUE)

x1

sf_use_s2(FALSE)
# Spherical geometry (s2) switched off
plot(st_buffer(pt, units::set_units(1, degree)), axes = TRUE, main = "GEOS")
# Warning message:
# In st_buffer.sfc(pt, units::set_units(1, degree)) :
#   st_buffer does not correctly buffer longitude/latitude data
plot(pt, add = TRUE)

x2

The first one is ragged, the second one has distances wrong. Picking a sensible default for the number of cells in the first case is still an open problem; narrow buffers around lines e.g. will need a lot.

The cell-based buffer from s2 always contains the entire "true" (smooth) buffer shape, and can be used to pre-select features, using distance calculations on them afterwards.

edzer avatar Jun 10 '21 12:06 edzer

I've used a 0 buffer to clean polygons over the years. This issue of a very small buffer adding noise to polygon edges throws a wrench in that. I guess the buffer-as-cleaning hack should go by the way side, but how else should I handle removing duplicate geometry nodes on polygons?

dblodgett-usgs avatar Jan 03 '22 16:01 dblodgett-usgs

@dblodgett-usgs What is sf_use_s2()? For planar geometries, it should work, but for spherical geometries may not, as I think your ndhplus issue indicates. Also the st_make_valid() function for planar geometries only is probably more robust than zero-buffering for recent GEOS versions.

rsbivand avatar Jan 03 '22 17:01 rsbivand

sf::sf_use_s2(FALSE) forces sf to use geos rather than s2 where applicable, by my understanding. In my package code, where use a 0 buffer to clean up potentially problematic geometry, I can just use that to avoid the issue discussed above.

It's good to know that st_make_valid() is going to work better. I remember needing this to get geometries that would play nice with ArcGIS's geometry validation rules -- that has always been a bit of a dark art in my experience as things that are valid in GEOS or other tools like PostGIS or the Java Topology Suite can still be invalid in Arc.

dblodgett-usgs avatar Jan 03 '22 18:01 dblodgett-usgs

Does Arc follow some kind of open standard about how it defines valid?

edzer avatar Jan 05 '22 14:01 edzer

There's a lot going on there. Maybe some documentation has come out recently, but when I was working on a subsetter, I was unable to find anything definitive. I went through a whole process to create this: https://github.com/USGS-R/nhdplusTools/blob/master/R/subset_nhdplus.R#L604

Where I would load something into a geopackage and try to open it in arcpro. One of the major things was duplicated nodes -- Arc would just bomb without telling me anything about why but removing the duplicate nodes solved the issue.

dblodgett-usgs avatar Jan 05 '22 14:01 dblodgett-usgs