sf icon indicating copy to clipboard operation
sf copied to clipboard

dplyr::bind_rows binds `sf` with different crs without error/warning or reprojection

Open MilesMcBain opened this issue 3 years ago • 3 comments

This leads to invalid data causing unpredictable downstream effects that are difficult to diagnose.

It should probably error as per rbind.

Reprex:

library(sf)
#> Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
bind_rows(
  sf1 = st_sf(a=3, st_sfc(st_point(1:2)), crs = 4283),
  sf2 = st_sf(a=3, st_sfc(st_point(1:2)), crs = 3112)
)
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 2 xmax: 1 ymax: 2
#> Geodetic CRS:  GDA94
#>   a st_sfc.st_point.1.2..
#> 1 3           POINT (1 2)
#> 2 3           POINT (1 2)

Created on 2022-01-13 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info  ──────────────────────────────────────────────────────────────
#>  hash: men holding hands: medium skin tone, couple with heart: woman, woman, baby angel: light skin tone
#>
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 20.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       Etc/UTC
#>  date     2022-01-13
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] RSPM (R 4.1.0)
#>  backports     1.2.1   2020-12-09 [1] RSPM (R 4.1.0)
#>  class         7.3-19  2021-05-03 [4] CRAN (R 4.0.5)
#>  classInt      0.4-3   2020-04-07 [1] RSPM (R 4.1.0)
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 4.1.2)
#>  crayon        1.4.2   2021-10-29 [1] CRAN (R 4.1.2)
#>  DBI           1.1.1   2021-01-15 [1] Custom
#>  digest        0.6.29  2021-12-01 [1] Custom
#>  dplyr       * 1.0.7   2021-06-18 [1] RSPM (R 4.1.0)
#>  e1071         1.7-9   2021-09-16 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2   2021-04-29 [1] RSPM (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] RSPM (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] RSPM (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] RSPM (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] RSPM (R 4.1.2)
#>  generics      0.1.1   2021-10-25 [1] CRAN (R 4.1.2)
#>  glue          1.6.0   2021-12-17 [1] CRAN (R 4.1.2)
#>  highr         0.9     2021-04-16 [1] RSPM (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.2)
#>  KernSmooth    2.23-20 2021-05-03 [4] CRAN (R 4.0.5)
#>  knitr         1.36    2021-09-29 [1] RSPM (R 4.1.2)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.2)
#>  magrittr      2.0.1   2020-11-17 [1] RSPM (R 4.1.0)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.1.0)
#>  proxy         0.4-26  2021-06-07 [1] RSPM (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] RSPM (R 4.1.0)
#>  R6            2.5.1   2021-08-19 [1] RSPM (R 4.1.0)
#>  Rcpp          1.0.7   2021-07-07 [1] RSPM (R 4.1.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.2)
#>  rlang         0.4.12  2021-10-18 [1] RSPM (R 4.1.2)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.1   2021-11-02 [1] RSPM (R 4.1.2)
#>  sf          * 1.0-5   2021-12-17 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6   2021-11-29 [1] Custom
#>  stringr       1.4.0   2019-02-10 [1] RSPM (R 4.1.0)
#>  styler        1.5.1   2021-07-13 [1] RSPM (R 4.1.0)
#>  tibble        3.1.6   2021-11-07 [1] CRAN (R 4.1.2)
#>  tidyselect    1.1.1   2021-04-30 [1] RSPM (R 4.1.0)
#>  units         0.7-2   2021-06-08 [1] RSPM (R 4.1.0)
#>  utf8          1.2.2   2021-07-24 [1] RSPM (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] RSPM (R 4.1.0)
#>  withr         2.4.3   2021-11-30 [1] CRAN (R 4.1.2)
#>  xfun          0.28    2021-11-04 [1] CRAN (R 4.1.2)
#>  yaml          2.2.1   2020-02-01 [1] RSPM (R 4.1.0)
#>
#>  [1] /home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────

MilesMcBain avatar Jan 13 '22 01:01 MilesMcBain

Wouldn't that be equivalent of merging dataset of patients with weight recorded in kgs and lbs and being surprised that BMI doesn't work any longer? ;)

RPanczak avatar Jan 13 '22 07:01 RPanczak

@RPanczak yes, coordinate reference systems have an analogy to measurement units; there is a package called units that lets you trap such errors, or automatically resolve them (if unit conversion is possible):

library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
a = set_units(1:3, kg)
b = set_units(4:6, lb)
c(a, b)
# Units: [kg]
# [1] 1.000000 2.000000 3.000000 1.814369 2.267962 2.721554
c(b, a)
# Units: [lb]
# [1] 4.000000 5.000000 6.000000 2.204623 4.409245 6.613868

@MilesMcBain thanks! - that is an issue, but AFAICT not an issue that package sf can resolve, as bind_rows is a function, not a generic. @lionel- I would have hoped a check for common CRS would happen in the vctrs support e.g. here, but it doesn't?

edzer avatar Jan 13 '22 08:01 edzer

Thank you for the pointer to units trick @edzer. I've grown to use it more and more and once again learnt how capable it is!

RPanczak avatar Jan 13 '22 09:01 RPanczak

I just also found this issue as it can result in some pretty strange coordinates (rbind behaves in a reasonable way):

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
data(meuse, package = "sp")
meuse_sf = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
a<-meuse_sf |>st_transform(4326)
dplyr::bind_rows(a[1,1], meuse_sf[2,1])
#> Simple feature collection with 2 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 5.758536 ymin: 50.99156 xmax: 181025 ymax: 333558
#> Geodetic CRS:  WGS 84
#>   cadmium                  geometry
#> 1    11.7 POINT (5.758536 50.99156)
#> 2     8.6     POINT (181025 333558)
rbind(a[1,1], meuse_sf[2,1])
#> Error: arguments have different crs

Created on 2023-01-19 with reprex v2.0.2

bart1 avatar Jan 19 '23 13:01 bart1

Thanks!

bart1 avatar Jan 19 '23 15:01 bart1