sf icon indicating copy to clipboard operation
sf copied to clipboard

Failing to speed up `st_union` / find access to `rgeos::gUnaryUnion` via `sf`

Open briatte opened this issue 4 weeks ago • 4 comments

I am trying to speed up a 'dissolve' operation on the following moderately-sized spatial object (link to RDS file, 950 kB), which has an attribute called DCOMIRIS to identify the geometries:

library(sp)
library(rgeos)
library(sf)

packageVersion("sp")    # 2.2.0
packageVersion("rgeos") # 0.6.4
packageVersion("sf")    # 1.0.23

# example data
# https://f.briatte.org/temp/example_spatial_object.rds

xx <- readRDS("example_spatial_object.rds")
class(xx) # SpatialPolygonsDataFrame

length(xx@data$DCOMIRIS) # 992 polygons
sum(duplicated(xx@data$DCOMIRIS)) # 176 polygons with duplicate IDs
# de-duplicated result should have 992 - 176 = 816 polygons

Using rgeos::gUnaryUnion on it gets the job done fast enough:

# [1] merge with {rgeos}

with_rgeos <- rgeos::gUnaryUnion(xx, id = xx@data$DCOMIRIS)
length(with_rgeos) # okay

However, the same operation with sf::st_union is much slower:

# [2] merge with {sf}

with_sf <- sf::st_as_sf(xx) %>%
  group_by(DCOMIRIS) %>%
  summarise(geometry = sf::st_union(geometry))

nrow(with_sf) # okay, but very much slower

I have resorted to splitting the data in two parts, in order to apply sf::st_union only to those polygons that I know share the same ID.

My first attempt to do so is below. It fails for some reason that I cannot understand:

# [3] merge with {sf}, trying to skip single-id polygons

with_sf_fail <- sf::st_as_sf(xx) %>%
  group_by(DCOMIRIS) %>%
  add_count(name = "n_geom") %>%
  summarise(geometry = if_else(n_geom == 1, geometry,
                               sf::st_union(geometry)))

nrow(with_sf_fail) # failed to merge -- why?
# `with_sf_fail` also lost its grouping variable DCOMIRIS -- why?

Splitting the data in two different datasets actually works, but it is still much slower than the rgeos method listed above:

# [4] merge with {sf}, separating the data in two slices

# slice 1, duplicate-ID polygons
z1 <- sf::st_as_sf(xx) %>%
  group_by(DCOMIRIS) %>%
  filter(n() > 1) %>%
  summarise()

# slice 2, single-ID polygons (dropping non-grouping attributes)
z2 <- sf::st_as_sf(xx) %>%
  # line below is slightly faster than having `group_by` separately
  filter(n() == 1, .by = DCOMIRIS) %>%
  select(DCOMIRIS, geometry)

with_sf_split <- sf:::rbind.sf(z1, z2)
nrow(with_sf_split) # okay, but... still slow

Questions

  1. I am wondering whether there is any way to either call rgeos::gUnaryUnion from sf, or to speed up the last 'method' shown above.
  2. As an aside, I am also unsure about where the error lies in the code of my third 'method' (although I am conscious that solving the error would probably not provide me with a solution to my speed issue).

I have read related issue, e.g. #661, but did not get the impression that they have led to a solution. All apologies if they have, or if I missed anything else that should be obvious.

Context: trying to update an old but valuable package that went into forced retirement when rgeos and maptools were discontinued.

briatte avatar Dec 10 '25 17:12 briatte

Could you please provide example data that can be opened? The file is ASCII (text), and:

> xx <- infoRDS("tmp/example_spatial_object.rds")
Error in infoRDS("tmp/example_spatial_object.rds") : unknown input format

Further, which package and where is its public repo?

rsbivand avatar Dec 10 '25 19:12 rsbivand

Dear @rsbivand

Apologies if the file did not work on first try. It should now, and here's what I get on my end after downloading it:

> infoRDS("~/Downloads/example_spatial_object2.rds")
$version
[1] 2

$writer_version
[1] "4.5.2"

$min_reader_version
[1] "2.3.0"

$format
[1] "xdr"

> class(readRDS(file = "~/Downloads/example_spatial_object2.rds"))
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

I am attaching the file to this comment just in case.

Further, which package and where is its public repo?

I am trying to update this package, which uses maptools (and rgeos through it) there.

example_spatial_object2.rds.zip

briatte avatar Dec 10 '25 23:12 briatte

Looking at package spReapportion, have you tried using sf::st_interpolate_aw() as an alternative? See here.

edzer avatar Dec 11 '25 12:12 edzer

For changes to migrate to sf or terra, please see https://github.com/r-spatial/evolution; there are many documents listed there. The package in question was not on CRAN or BioConductor, so was not found in 2022-2023 when we were retiring outdated implementations of interfaces to external software.

To your first question:

> infoRDS("example_spatial_object2.rds")
$version
[1] 2

$writer_version
[1] "4.5.2"

$min_reader_version
[1] "2.3.0"

$format
[1] "xdr"

> library(sp)
> library(sf)
Linking to GEOS 3.14.1, GDAL 3.12.0, PROJ 9.7.1; sf_use_s2() is TRUE
> xx <- readRDS("example_spatial_object2.rds")
> class(xx) # SpatialPolygonsDataFrame
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
> library(dplyr)
> system.time(xx |> sf::st_as_sf() |> group_by(DCOMIRIS) |> summarise(geometry = sf::st_union(geometry)) -> with_sf)
   user  system elapsed 
  8.562   0.611   9.327 
> system.time(xx[, "DCOMIRIS"] |> sf::st_as_sf() |> aggregate(list(xx$DCOMIRIS), head, n=1) -> with_sf1)
   user  system elapsed 
  0.339   0.004   0.345 
> length(xx$DCOMIRIS)
[1] 992
> length(unique(xx$DCOMIRIS))
[1] 979
> sum(duplicated(xx$DCOMIRIS))
[1] 13
> length(unique(xx$DCOMIRIS)) + sum(duplicated(xx$DCOMIRIS))
[1] 992

sf::aggregate is the correct match for rgeos::unaryUnion.

As @edzer wrote, you may well find that sf::st_interpolate_aw offers what your package provided.

If you need to re-create output from your package, you'll find MacOS binaries of retired packages for R 4.2 and 4.3, and Windows binaries for 4.2 - 4.4 at https://github.com/r-spatial/evolution/tree/main/backstore (on Linux you would install rgeos from source on a version of GEOS matching the R package). I would try to see how st_interpolate_aw matches your desired output, based on https://r-spatial.org/book/05-Attributes.html#sec-area-weighted and references therein. If it does, simply use that. If not, establish a baseline output from installing your package on an older version of R with matching rgeos (easiest as binary), and re-implement the functionality you need possibly using materials in https://github.com/r-spatial/evolution with sf.

A side note - handling of coordinate reference systems has also changed, where I think the EPSG code for what you had in the example data set is 2154.

rsbivand avatar Dec 11 '25 13:12 rsbivand

Dear @rsbivand

This works perfectly, thank you very much!

@edzer -- I will make sure to take a look at sf::st_interpolate_aw() too, thanks for this.

briatte avatar Dec 16 '25 12:12 briatte