sf
sf copied to clipboard
dplyr::bind_rows binds `sf` with different crs without error/warning or reprojection
This leads to invalid data causing unpredictable downstream effects that are difficult to diagnose.
It should probably error as per rbind
.
Reprex:
library(sf)
#> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
bind_rows(
sf1 = st_sf(a=3, st_sfc(st_point(1:2)), crs = 4283),
sf2 = st_sf(a=3, st_sfc(st_point(1:2)), crs = 3112)
)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1 ymin: 2 xmax: 1 ymax: 2
#> Geodetic CRS: GDA94
#> a st_sfc.st_point.1.2..
#> 1 3 POINT (1 2)
#> 2 3 POINT (1 2)
Created on 2022-01-13 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────────
#> hash: men holding hands: medium skin tone, couple with heart: woman, woman, baby angel: light skin tone
#>
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Ubuntu 20.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz Etc/UTC
#> date 2022-01-13
#> pandoc 2.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.1.0)
#> backports 1.2.1 2020-12-09 [1] RSPM (R 4.1.0)
#> class 7.3-19 2021-05-03 [4] CRAN (R 4.0.5)
#> classInt 0.4-3 2020-04-07 [1] RSPM (R 4.1.0)
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.2)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.2)
#> DBI 1.1.1 2021-01-15 [1] Custom
#> digest 0.6.29 2021-12-01 [1] Custom
#> dplyr * 1.0.7 2021-06-18 [1] RSPM (R 4.1.0)
#> e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] RSPM (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] RSPM (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] RSPM (R 4.1.0)
#> fs 1.5.2 2021-12-08 [1] RSPM (R 4.1.2)
#> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.2)
#> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] RSPM (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2)
#> KernSmooth 2.23-20 2021-05-03 [4] CRAN (R 4.0.5)
#> knitr 1.36 2021-09-29 [1] RSPM (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2)
#> magrittr 2.0.1 2020-11-17 [1] RSPM (R 4.1.0)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.1.0)
#> proxy 0.4-26 2021-06-07 [1] RSPM (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] RSPM (R 4.1.0)
#> R6 2.5.1 2021-08-19 [1] RSPM (R 4.1.0)
#> Rcpp 1.0.7 2021-07-07 [1] RSPM (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.2)
#> rlang 0.4.12 2021-10-18 [1] RSPM (R 4.1.2)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.2)
#> sessioninfo 1.2.1 2021-11-02 [1] RSPM (R 4.1.2)
#> sf * 1.0-5 2021-12-17 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] Custom
#> stringr 1.4.0 2019-02-10 [1] RSPM (R 4.1.0)
#> styler 1.5.1 2021-07-13 [1] RSPM (R 4.1.0)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> tidyselect 1.1.1 2021-04-30 [1] RSPM (R 4.1.0)
#> units 0.7-2 2021-06-08 [1] RSPM (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] RSPM (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] RSPM (R 4.1.0)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.28 2021-11-04 [1] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] RSPM (R 4.1.0)
#>
#> [1] /home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Wouldn't that be equivalent of merging dataset of patients with weight recorded in kgs and lbs and being surprised that BMI doesn't work any longer? ;)
@RPanczak yes, coordinate reference systems have an analogy to measurement units; there is a package called units
that lets you trap such errors, or automatically resolve them (if unit conversion is possible):
library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
a = set_units(1:3, kg)
b = set_units(4:6, lb)
c(a, b)
# Units: [kg]
# [1] 1.000000 2.000000 3.000000 1.814369 2.267962 2.721554
c(b, a)
# Units: [lb]
# [1] 4.000000 5.000000 6.000000 2.204623 4.409245 6.613868
@MilesMcBain thanks! - that is an issue, but AFAICT not an issue that package sf
can resolve, as bind_rows
is a function, not a generic. @lionel- I would have hoped a check for common CRS would happen in the vctrs support e.g. here, but it doesn't?
Thank you for the pointer to units
trick @edzer. I've grown to use it more and more and once again learnt how capable it is!
I just also found this issue as it can result in some pretty strange coordinates (rbind
behaves in a reasonable way):
require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
data(meuse, package = "sp")
meuse_sf = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
a<-meuse_sf |>st_transform(4326)
dplyr::bind_rows(a[1,1], meuse_sf[2,1])
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 5.758536 ymin: 50.99156 xmax: 181025 ymax: 333558
#> Geodetic CRS: WGS 84
#> cadmium geometry
#> 1 11.7 POINT (5.758536 50.99156)
#> 2 8.6 POINT (181025 333558)
rbind(a[1,1], meuse_sf[2,1])
#> Error: arguments have different crs
Created on 2023-01-19 with reprex v2.0.2
Thanks!