sf icon indicating copy to clipboard operation
sf copied to clipboard

move POINT geometries out of a list column, into a matrix

Open edzer opened this issue 2 years ago • 22 comments

This PR moves POINT geometries out of the list column, leaves a list column filled with NULL values in place, and moves the point coordinates in an n x 2 (or 3 for XYM/XYZ, or 4 for XYZM) matrix attribute called points. The class dim label (XY/XYZ/XYM/XYZM) is put in attribute points_dim. This happens when:

  • st_as_sf.data.frame is called with e.g. coords specifying the coordinate colums
  • WKB in converted into an sfc, i.e. when coming back from st_read() or any of the GEOS functions. Right now, a[] restores the original list of sfg objects:
library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
data.frame(z=rnorm(3), x=1:3, y=3:1) |>
  st_as_sf(coords = c("x", "y")) -> x
x$geometry |>
  unclass() |> 
  str()
# List of 3
#  $ : NULL
#  $ : NULL
#  $ : NULL
#  - attr(*, "points")= num [1:3, 1:2] 1 2 3 3 2 1
#  - attr(*, "points_dim")= chr "XY"
#  - attr(*, "n_empty")= int 0
#  - attr(*, "precision")= num 0
#  - attr(*, "crs")=List of 2
#   ..$ input: chr NA
#   ..$ wkt  : chr NA
#   ..- attr(*, "class")= chr "crs"
#  - attr(*, "bbox")= 'bbox' Named num [1:4] 1 1 3 3
#   ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"
x$geometry[] |>
  unclass() |> 
  str()
# List of 3
#  $ : 'XY' num [1:2] 1 3
#  $ : 'XY' num [1:2] 2 2
#  $ : 'XY' num [1:2] 3 1
#  - attr(*, "n_empty")= int 0
#  - attr(*, "precision")= num 0
#  - attr(*, "crs")=List of 2
#   ..$ input: chr NA
#   ..$ wkt  : chr NA
#   ..- attr(*, "class")= chr "crs"
#  - attr(*, "bbox")= 'bbox' Named num [1:4] 1 1 3 3
#   ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"

This may give some performance changes, both in memory footprint and runtime; still need to do benchmarks.

Comments welcome!

edzer avatar Dec 13 '22 20:12 edzer

Memory size: decreases with a factor 18:

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
n = 1e6
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
(s1 = object.size(pts))
# 24002864 bytes
(s2 = object.size(pts[]))
# 432002312 bytes
as.numeric(s2)/as.numeric(s1)
# [1] 17.99795

edzer avatar Dec 13 '22 20:12 edzer

This one is pretty obvious:

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
n = 1e6
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
pts   |> st_is_empty() |> system.time()
#    user  system elapsed 
#   1.692   0.072   1.764 
system.time(pts0 <- pts[])
#    user  system elapsed 
#   1.104   0.036   1.140 
pts0 |> st_is_empty()  |> system.time()
#    user  system elapsed 
#   1.853   0.057   1.909 
pts |> st_combine() |> system.time()
#    user  system elapsed 
#   0.002   0.000   0.002 
pts0 |> st_combine() |> system.time()
#    user  system elapsed 
#   3.119   0.155   3.275 

edzer avatar Dec 13 '22 22:12 edzer

This is a very cool idea! The use of an attribute to maintain a semblance of backward compatibility with the structure is clever.

I do wonder if this change is going to result in enough performance gains to justify the maintenance cost of the additional representation. My adventures playing with this branch are below...basically all the ways I know about based on the benchmarks above. wk::xy() is more like data.frame(x, y, z) than what you have here...basically just because it doesn't require the matrix attribute shenanigan to work with existing data frames. The GEOS representation is a list() of external pointers to GEOSGeometry(). Geoarrow isn't quite ready for this yet but its kernels will probably be faster than both of those.

While I certainly don't mind if this representation exists in sf, my first impression is that it's rather complicated and not that fast. If performance is the motivation, I wonder if broadening the sf API to work with other S3 classes (e.g., wk::xy(), geos::geos_geometry(), whatever terra has on the go). Functionally that mostly means just making more things S3 generic (then people like me can provide fast implementations in wk and geos without you having to do anything!). Slightly harder would be loosening that assumption for the sf geometry column.

Another low-hanging performance fruit would be to drop the cached bbox and geometry classes (that's most of why the geos package is faster than sf in most cases).

library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.0, PROJ 9.1.1; sf_use_s2() is TRUE

n = 1e5
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
    st_geometry() -> pts

pts0 <- pts[]
wk_xy <- wk::as_xy(df)
geos_pts <- geos::as_geos_geometry(wk_xy)

lobstr::obj_size(pts)
#> 2.40 MB
lobstr::obj_size(pts0)
#> 12.80 MB
lobstr::obj_size(wk_xy)
#> 2.40 MB
# not really accurate...only includes R allocs
lobstr::obj_size(geos_pts)
#> 7.20 MB

bench::mark(
    st_is_empty(pts),
    st_is_empty(pts0),
    wk::wk_meta(wk_xy)$size == 0,
    wk::wk_meta(geos_pts)$size == 0,
    geos::geos_is_empty(geos_pts)
)
#> # A tibble: 5 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 st_is_empty(pts)                 33.83ms  34.48ms      28.9    1.16MB     12.8
#> 2 st_is_empty(pts0)                46.59ms   46.8ms      21.4    1.15MB     49.8
#> 3 wk::wk_meta(wk_xy)$size == 0      2.06ms   2.25ms     421.      3.1MB     53.9
#> 4 wk::wk_meta(geos_pts)$size == 0   4.21ms   4.43ms     221.     3.06MB     31.1
#> 5 geos::geos_is_empty(geos_pts)   756.29µs 840.21µs    1180.   406.18KB     22.7

bench::mark(
    st_combine(pts),
    st_combine(pts0),
    wk::wk_collection(wk_xy, wk::wk_geometry_type("multipoint")),
    geos::geos_make_collection(geos_pts, "multipoint"),
    check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 4 × 6
#>   expression                                                        min   median
#>   <bch:expr>                                                   <bch:tm> <bch:tm>
#> 1 st_combine(pts)                                               97.99µs 100.61µs
#> 2 st_combine(pts0)                                             139.98ms 181.44ms
#> 3 wk::wk_collection(wk_xy, wk::wk_geometry_type("multipoint"))   1.85ms   2.03ms
#> 4 geos::geos_make_collection(geos_pts, "multipoint")             6.09ms   6.51ms
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>

paleolimbot avatar Dec 14 '22 02:12 paleolimbot

@rsbivand thanks, that should now work; I finished implementation so far only to pass sf's own tests, this was not in there.

edzer avatar Dec 14 '22 11:12 edzer

I just encountered an other case where this pull request could provide a big speedup. When using print.sfc it can be quite slow (20 seconds for 10^7 points) as the dimensions are calculated in an apply function: https://github.com/r-spatial/sf/blob/main/R/sfc.R#L195 . If this information can directly be taken from the attribute points_dim this should speed up considerably

bart1 avatar Dec 22 '22 08:12 bart1

... which is rather stupid (the 20 seconds), given that #2046 suggests sf reading breaks (unintended) in case of mixed dimensions...

edzer avatar Dec 22 '22 08:12 edzer

Today ALTREP support for lists appeared in R SVN. Do you think it can somehow affect {sf} if it uses lists as data structure?

kadyb avatar Feb 28 '23 22:02 kadyb

Thanks for reporting! I'm not an altrep expert, but would expect no miracles; I think it's mostly useful for (i) data structures out of memory (e.g. in a memory mapped area) or (ii) unrealised arrays (e.g. 1:1e12, where you're going to compute the value when you need it rather than realize it in memory). The first would help with massive point sets (but merely make it possible, not fast), the second doesn't help us here.

The speedup of this PR (if any) is due to the flat memory model of matrices vs. the arbitrary memory layout of lists; accessing lists elements is more expensive because of that; the smaller memory footprint may be even more important.

edzer avatar Feb 28 '23 22:02 edzer

FWIW I actually think ALTREP for lists will be very useful here. It will let you use the data structure you've prototyped here without breaking backwards compatibility with users of the previous data structure (i.e., everybody).

paleolimbot avatar Mar 30 '23 19:03 paleolimbot

Great idea, do you know of an example of such a thing?

edzer avatar Mar 30 '23 20:03 edzer

You can do it today with S3:

library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

point_matrix <- structure(
  matrix(1:6, nrow = 3, ncol = 2),
  class = "point_matrix"
)

length.point_matrix <- function(x) nrow(unclass(x))

# Becomes extract_subset
`[.point_matrix` <- function(x, i) {
  structure(unclass(x)[i, , drop = FALSE], class = "point_matrix")
}

# Becomes VECTOR_ELT
`[[.point_matrix` <- function(x, i) {
  data <- unclass(x)
  st_point(data[i, , drop = TRUE])
}

# Becomes SET_VECTOR_ELT
`[[<-.point_matrix` <- function(x, i, value) {
  data <- unclass(x)
  data[i, ] <- unclass(value)
  structure(data, class = "point_matrix")
}

materialize_point_matrix <- function(x) {
  st_as_sfc(lapply(seq_along(x), function(i) x[[i]]))
}

point_matrix[1:2]
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> attr(,"class")
#> [1] "point_matrix"
point_matrix[[1]]
#> POINT (1 4)
point_matrix[[1]] <- st_point(c(3, 8))
point_matrix
#>      [,1] [,2]
#> [1,]    3    8
#> [2,]    2    5
#> [3,]    3    6
#> attr(,"class")
#> [1] "point_matrix"

materialize_point_matrix(point_matrix)
#> Geometry set for 3 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 2 ymin: 5 xmax: 3 ymax: 8
#> CRS:           NA
#> POINT (3 8)
#> POINT (2 5)
#> POINT (3 6)

ALTREP lists haven't been released yet, but when they are those methods can get rewritten in C and assembled into an ALTREP class (I can help).

The main idea is that, in addition to being a toolbox for geospatial analysis in R, the sfc column format is an ABI-stable way to represent geometries that has enabled an entire ecosystem of geospatial packages to thrive in R. You can and should think about ways to let the toolbox for geospatial analysis part of this package accept more efficient representations of a point vector (including but perhaps not limited to the one you've implemented here); however, a breaking change to the columnar format seems unnecessary.

paleolimbot avatar Mar 31 '23 12:03 paleolimbot

I just played around with this pull request out of curiosity what the consequences would be for move2. I see substantial memory usage reductions, in the example data from 7mb to 2mb. That is great! I also noticed when running the test of the package, some errors occur when adding empty points to an sfc vector. For example st_distance failing here (NaN results). I guess this relates to combining vectors or creating empty points. Here is an example (code works with current sf package):

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
empty <- st_as_sfc("POINT(EMPTY)", crs = st_crs(pts))
c(pts, empty)
#> Geometry set for 5 features  (with 1 geometry empty)
#> Geometry type: POINT
#> Error in a$points[i, , drop = FALSE]: subscript out of bounds
st_distance(c(pts, empty), c(empty, pts), by_element = T)
#> [1] NaN NaN NaN NaN
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2023-06-29
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  class         7.3-22  2023-05-03 [1] CRAN (R 4.1.2)
#>  classInt      0.4-9   2023-02-28 [1] CRAN (R 4.1.2)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.1.2)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 4.1.2)
#>  digest        0.6.32  2023-06-26 [1] CRAN (R 4.1.2)
#>  dplyr         1.1.2   2023-04-20 [1] CRAN (R 4.1.2)
#>  e1071         1.7-13  2023-02-01 [1] CRAN (R 4.1.2)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.1.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.1.2)
#>  fs            1.6.2   2023-04-25 [1] CRAN (R 4.1.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.1.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.1.2)
#>  KernSmooth    2.23-21 2023-05-03 [1] CRAN (R 4.1.2)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.1.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  proxy         0.4-27  2022-06-09 [1] CRAN (R 4.1.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.1.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.1.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.1.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.1.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.1.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.1.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.1.2)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.22    2023-06-01 [1] CRAN (R 4.1.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  sf          * 1.0-13  2023-06-29 [1] Github (r-spatial/sf@e3def06)
#>  styler        1.10.1  2023-06-05 [1] CRAN (R 4.1.2)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.1.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.1.2)
#>  units         0.8-2   2023-04-27 [1] CRAN (R 4.1.2)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.1.2)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.1.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.1.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

bart1 avatar Jun 29 '23 14:06 bart1

Great example; that should run better, now.

edzer avatar Jun 29 '23 18:06 edzer

Thanks @edzer , I tested a bit more and noticed two other examples changing for unexpected behavior. Sorry for having these issues iteratively, the popup up in some of the unit test. Let me also know if your not looking for these detailed tests yet

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
dfsf<-st_as_sf(df, coords = c("x", "y"))
# The filtered data turns empty:
dfsf |> filter(rep(T,4))
#> Simple feature collection with 4 features and 1 field (with 4 geometries empty)
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
#>             z                 geometry
#> 1 -0.32671172 GEOMETRYCOLLECTION EMPTY
#> 2 -0.05574357 GEOMETRYCOLLECTION EMPTY
#> 3 -1.34424667 GEOMETRYCOLLECTION EMPTY
#> 4 -0.24700773 GEOMETRYCOLLECTION EMPTY
# there is no empty point inserted
dfsf$geometry[2]<-st_point()
dfsf
#> Simple feature collection with 4 features and 1 field (with 4 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 0.02291049 ymin: 0.2227787 xmax: 0.6861906 ymax: 0.4822213
#> CRS:           NA
#>             z                     geometry
#> 1 -0.29728105  POINT (0.3049286 0.4746749)
#> 2 -0.33397322  POINT (0.5350945 0.2988531)
#> 3 -0.93209304  POINT (0.6861906 0.4822213)
#> 4  0.08929689 POINT (0.02291049 0.2227787)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2023-06-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  ...
#>  dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.1.2)
#>  ...
#>  sf          * 1.0-14  2023-06-29 [1] Github (r-spatial/sf@3ec76b8)
#>  ...
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

bart1 avatar Jun 30 '23 14:06 bart1

This should now also run.

edzer avatar Jun 30 '23 18:06 edzer

That works for the examples above but filter fails if there is an empty location:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
dfsf<-st_as_sf(df, coords = c("x", "y"))
dfsf$geometry[2]<-st_point()
dfsf |> dplyr::filter(rep(T,4))
#> Error in st_as_sf.data.frame(NextMethod(), coords = attr(.data, "sf_column"), : missing values in coordinates not allowed

I also see a few other changes in behavior/test error but I have not been able to make small reproducible examples out of those yet

bart1 avatar Jun 30 '23 20:06 bart1

It seems dplyr filter drops the CRS:

require(dplyr)
require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
data(meuse, package = "sp")
meuse_sf = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
st_crs(meuse_sf)
#> Coordinate Reference System:
#>   User input: EPSG:28992 
#>   wkt:
....
#>         BBOX[50.75,3.2,53.7,7.22]],
#>     ID["EPSG",28992]]
st_crs(dplyr::filter(meuse_sf, copper>80))
#> Coordinate Reference System: NA
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
.....
#>  sf          * 1.0-14  2023-07-01 [1] Github (r-spatial/sf@203187d)
.....
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-07-02 with reprex v2.0.2

bart1 avatar Jul 02 '23 21:07 bart1

Hi, I have been doing some more testing. In general I use the test suite from move2 (most test/functions use sf in the background in some way) to find these changes from before (for the move2 package this pull request would be quite beneficial). Then I try to isolate these errors to sf workflows. Let me know there is no desire to merge this request then I will stop testing.

I have found a few more cases that show some strange/undesired? behavior.

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
d<-st_as_sf(data.frame(x=c(1:4), y=c(1:3,NA)), coords=c("x","y"), na.fail = F)
st_is_empty(st_geometry(d))
#> [1] FALSE FALSE FALSE FALSE
st_is_empty(d) 
#> [1] FALSE FALSE FALSE FALSE
# should be c(F,F,F, T)?, now is c(F,F,F,F)

dput(st_geometry(d))
#> structure(list(NULL, NULL, NULL, NULL), points = structure(c(1, 
#> 2, 3, 4, 1, 2, 3, NA), dim = c(4L, 2L)), points_dim = "XY", n_empty = 0L, precision = 0, crs = structure(list(
#>     input = NA_character_, wkt = NA_character_), class = "crs"), bbox = structure(c(xmin = 1, 
#> ymin = 1, xmax = 4, ymax = 3), class = "bbox"), class = c("sfc_POINT", 
#> "sfc"))
dput(st_geometry(d[1:4,])) 
#> structure(list(structure(c(1, 1), class = c("XY", "POINT", "sfg"
#> )), structure(c(2, 2), class = c("XY", "POINT", "sfg")), structure(c(3, 
#> 3), class = c("XY", "POINT", "sfg")), structure(c(4, NA), class = c("XY", 
#> "POINT", "sfg"))), n_empty = 0L, precision = 0, crs = structure(list(
#>     input = NA_character_, wkt = NA_character_), class = "crs"), bbox = structure(c(xmin = 1, 
#> ymin = 1, xmax = 4, ymax = 3), class = "bbox"), class = c("sfc_POINT", 
#> "sfc"))
# sub setting directly reduces to the old structure, keeping the matrix might be nicer and makes sure the benefits are throughout scripts

e<-st_as_sf(data.frame(x=c(1:3, NA), y=c(1:3,NA)), coords=c("x","y"), na.fail = F)

st_is_empty(st_geometry(e)) 
#> [1] FALSE FALSE FALSE  TRUE
# works correctly
st_is_empty(st_geometry(dplyr::filter(e, rep(T,4)))) 
#> Error in scan(text = lst[[length(lst)]], quiet = TRUE): scan() expected 'a real', got 'ParseException:'
#> Error in (function (msg) : ParseException: Unexpected EOF parsing WKB
# fails,  one difference I see is that a Z dimension is added to points_dim.
sessioninfo::session_info(pkgs = "attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-22
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-14  2024-01-22 [1] Github (r-spatial/sf@d38b3dc)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-01-22 with reprex v2.0.2

bart1 avatar Jan 22 '24 14:01 bart1

Another small change/corner case casting and empty sfc fails:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
st_cast(st_sfc(),"POINT")
#> Error in cls[2, ]: incorrect number of dimensions
sessioninfo::session_info("attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-23
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-14  2024-01-22 [1] Github (r-spatial/sf@d38b3dc)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

This used to work fine:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
st_cast(st_sfc(),"POINT")
#> Geometry set for 0 features 
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
sessioninfo::session_info("attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-23
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-15  2023-12-18 [1] CRAN (R 4.3.2)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-01-23 with reprex v2.0.2

bart1 avatar Jan 23 '24 09:01 bart1

Great to see progress! @edzer am I right to notice that this resolves the cast error I posted today but does not address the remarks from yesterday?

bart1 avatar Jan 23 '24 13:01 bart1

Additionally I found this issue with replacing values:

require(sf)
d<-st_as_sf(data.frame(x=1:3,y=1:3), coords = c("x","y"))
e<-st_as_sf(data.frame(x=4:5,y=4:5), coords = c("x","y"))
# Works correctly if both are of the list type:
replace(st_geometry(d)[T,], c(F,T,T), st_geometry(e)[T,])
#> Geometry set for 3 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 5 ymax: 5
#> CRS:           NA
#> POINT (1 1)
#> POINT (4 4)
#> POINT (5 5)
# However not of the replacement is to the matrix type:
replace(st_geometry(d)[T,], c(F,T,T), st_geometry(e))
#> Geometry set for 3 features  (with 2 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 1 ymax: 1
#> CRS:           NA
#> POINT (1 1)
#> POINT EMPTY
#> POINT EMPTY
# Replace is equivalent to:
dd<-st_geometry(d)[T,]
ee<-st_geometry(e)
dd[c(F,T,T)]<-ee
dd
#> Geometry set for 3 features  (with 2 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 1 ymax: 1
#> CRS:           NA
#> POINT (1 1)
#> POINT EMPTY
#> POINT EMPTY
sessioninfo::package_info("attached")
#>  package * version date (UTC) lib source
#>  sf      * 1.0-16  2024-01-23 [1] Github (r-spatial/sf@4ce501e)

bart1 avatar Jan 23 '24 15:01 bart1