sf icon indicating copy to clipboard operation
sf copied to clipboard

Should sf_use_s2() be an option?

Open MilesMcBain opened this issue 1 year ago • 5 comments

sf_use_s2() manipulates a variable hidden inside an environment in the sf namespace. This approach causes problems when using {targets} - which will dutifully copy options and global variables set in the plan (head thread) to the environment of targets that are evaluated in worker threads. {targets} won't copy state hidden inside the {sf} namespace though. So worker threads will default back to sf_use_s2(TRUE).

There is a work around for targets, which is to create a project level .Rprofile that contains a call to sf_use_s2().

There doesn't seem to be much of a payoff associated with this unconventional approach. sf_use_s2()'s main purpose seems to be to check whether {s2} is installed and warn if not. This is not necessary at this point because sf imports s2 and the default within sf is to use s2 where possible.

MilesMcBain avatar Jul 19 '22 07:07 MilesMcBain

sf_use_s2()'s main purpose seems to be to check whether {s2} is installed and warn if not.

There's more to it: for geometrical operations sf_use_s2() determines whether to choose a spherical algorithm or a planar algorithm. Many datasets are prepared for 2D, flat approaches to global data (GIS legacy), so users of such datasets may want to switch using s2 off, despite it's availability. It allows users to switch entirely to 2-D flat earth representation without uninstalling a package.

edzer avatar Jul 19 '22 07:07 edzer

This is the current implementation:

sf_use_s2 = function(use_s2) {
	ret_val = get(".sf.use_s2", envir = .sf_cache)
	if (! missing(use_s2)) {
		stopifnot(is.logical(use_s2), length(use_s2)==1, !is.na(use_s2))
		if (use_s2 && !requireNamespace("s2", quietly = TRUE))
			stop("package s2 not available: install it first?")
		if (ret_val != use_s2)
			message(paste0("Spherical geometry (s2) switched ", ifelse(use_s2, "on", "off")))
		assign(".sf.use_s2", use_s2, envir = .sf_cache)
		invisible(ret_val)
	} else
		ret_val
}

And we're comparing this against the standard:

options(sf_use_s2 = TRUE) # user does this
getOption("sf_use_s2", default = TRUE) # sf code does this

The thing the current function does that is not covered by the standard way:

  • Checks if {s2} is installed - now irrelevant.
  • Messages the user that the mode has been changed - mostly pointless because if the mode has been changed, it's probably because user just did it.

You could argue it alerts you to code you're calling that changes the mode, which would be okay. But perhaps if it was a standard option other code might be less inclined to do that, because it's known by convention to be a bad thing to modify options in a permanent way in package code.

Some hybrid approach might be possible that allows the user to set the default populated in .sf_cache using an option?

Re:

flat approaches to global data (GIS legacy)

This came up today because I dug out an old pipeline that used to work but now failed with current sf. I got an error about 'feature with invalid spherical geometry'. The interesting thing was, I was performing a standard bounding box crop (st_crop) on point geometry with no empty points. In 'GIS legacy' mode it worked perfectly fine. I seem to hit these kinds of things frequently with s2 and sf. I have colleagues that advise calling sf_use_s2(FALSE) in the .Rprofile because of similar experiences. I'm all for the better approaches, the only thing I care about is that they work reliably.

Given we don't currently have that reliability, it makes sense to give users a standard option we can use to disable spherical via s2.

MilesMcBain avatar Jul 19 '22 12:07 MilesMcBain

I think this is a reasonable proposal, I can't see downsides of putting this option in options().

Given we don't currently have that reliability,

I look forward to a reprex from you or your colleagues for anything that can be improved.

edzer avatar Jul 19 '22 12:07 edzer

I got an error about 'feature with invalid spherical geometry'. The interesting thing was, I was performing a standard bounding box crop (st_crop) on point geometry with no empty points. In 'GIS legacy' mode it worked perfectly fine. I seem to hit these kinds of things frequently with s2 and sf.

I also run into this kind of error with some regularity and have found it very frustrating to make many geometries I come across valid. To the point that I just want my personal sf_use_s2() to default to FALSE as the s2 geometry engine requires geometry that is cleaner than I typically get in my day to day work.

Having a system-level option rather than a default I have to constantly change would be a big benefit.

dblodgett-usgs avatar Aug 10 '22 02:08 dblodgett-usgs

(sorry this wan meant to go in a different branch) - this gives us

getOption("sf_use_s2")
# NULL
library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
getOption("sf_use_s2")
# NULL
sf_use_s2()
# [1] TRUE
getOption("sf_use_s2")
# NULL
sf_use_s2(FALSE)
# Spherical geometry (s2) switched off
getOption("sf_use_s2")
# [1] FALSE
sf_use_s2()
# [1] FALSE
getOption("sf_use_s2")
# [1] FALSE
options(sf_use_s2 = FALSE)
sf_use_s2()
# [1] FALSE

and after setting environment variable _SF_USE_S2=false before loading sf gives us

getOption("sf_use_s2")
# NULL
library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is FALSE
getOption("sf_use_s2")
# [1] FALSE
sf_use_s2()
# [1] FALSE
getOption("sf_use_s2")
# [1] FALSE
sf_use_s2(FALSE)
getOption("sf_use_s2")
# [1] FALSE
sf_use_s2()
# [1] FALSE
getOption("sf_use_s2")
# [1] FALSE
options(sf_use_s2 = FALSE)
sf_use_s2()
# [1] FALSE

edzer avatar Aug 10 '22 10:08 edzer