sf icon indicating copy to clipboard operation
sf copied to clipboard

`st_as_sf.ppplist()` does not add `marks` columns from `spatstat` objects and extracts only the window of the first `ppplist` element

Open henningte opened this issue 2 years ago • 7 comments

st_as_sf.ppplist() does not add marks columns from the spatstat objects in the ppplist, i.e. if the objects in the ppplist have any additional variables in the marks element, these will be dropped.

I don't know if this behavior is expected given that the conversion methods to sf from individual spatstat classes (e.g. ppp, psp) add the variables from the marks element.

Here is an example:

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> Registered S3 method overwritten by 'spatstat.geom':
#>   method     from
#>   print.boxx cli
#> spatstat.geom 2.4-0

# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas

# show mark colnames
colnames(gorillas1$marks)
#> [1] "group"  "season" "date"

# convert to sf: problem: marks are dropped
x_sf_ppplist <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2)) 
colnames(x_sf_ppplist)
#> [1] "label" "geom"


# this contrasts the behavior when only one ppp object is converted to sf (marks are added as columns)
x_sf_ppp <- st_as_sf(gorillas1)
colnames(x_sf_ppp)
#> [1] "group"  "season" "date"   "label"  "geom"

Created on 2022-04-03 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.1 (2020-06-06)
#>  os       Windows 10 x64              
#>  system   i386, mingw32               
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2022-04-03                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package        * version date       lib source                             
#>  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                     
#>  backports        1.1.7   2020-05-13 [1] CRAN (R 4.0.0)                     
#>  blob             1.2.1   2020-01-20 [1] CRAN (R 4.0.0)                     
#>  class            7.3-17  2020-04-26 [2] CRAN (R 4.0.1)                     
#>  classInt         0.4-3   2020-04-07 [1] CRAN (R 4.0.0)                     
#>  cli              3.1.0   2021-10-27 [1] CRAN (R 4.0.5)                     
#>  crayon           1.4.2   2021-10-29 [1] CRAN (R 4.0.5)                     
#>  DBI              1.1.0   2019-12-15 [1] CRAN (R 4.0.0)                     
#>  deldir           1.0-6   2021-10-23 [1] CRAN (R 4.0.5)                     
#>  digest           0.6.25  2020-02-23 [1] CRAN (R 4.0.0)                     
#>  dplyr            1.0.7   2021-06-18 [1] CRAN (R 4.0.5)                     
#>  e1071            1.7-3   2019-11-26 [1] CRAN (R 4.0.0)                     
#>  ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.0.5)                     
#>  evaluate         0.14    2019-05-28 [1] CRAN (R 4.0.0)                     
#>  fansi            0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                     
#>  fs               1.4.1   2020-04-04 [1] CRAN (R 4.0.0)                     
#>  generics         0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                     
#>  glue             1.4.1   2020-05-13 [1] CRAN (R 4.0.2)                     
#>  highr            0.8     2019-03-20 [1] CRAN (R 4.0.0)                     
#>  htmltools        0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)                     
#>  KernSmooth       2.23-17 2020-04-26 [2] CRAN (R 4.0.1)                     
#>  knitr            1.37    2021-12-16 [1] CRAN (R 4.0.5)                     
#>  lattice          0.20-41 2020-04-02 [2] CRAN (R 4.0.1)                     
#>  lifecycle        1.0.1   2021-09-24 [1] CRAN (R 4.0.5)                     
#>  magrittr         1.5     2014-11-22 [1] CRAN (R 4.0.0)                     
#>  Matrix           1.2-18  2019-11-27 [2] CRAN (R 4.0.1)                     
#>  pillar           1.6.4   2021-10-18 [1] CRAN (R 4.0.5)                     
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                     
#>  polyclip         1.10-0  2019-03-14 [1] CRAN (R 4.0.0)                     
#>  purrr            0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                     
#>  R6               2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                     
#>  Rcpp             1.0.8   2022-01-13 [1] CRAN (R 4.0.5)                     
#>  reprex           2.0.1   2021-08-05 [1] CRAN (R 4.0.5)                     
#>  rlang            0.4.12  2021-10-18 [1] CRAN (R 4.0.5)                     
#>  rmarkdown        2.11    2021-09-14 [1] CRAN (R 4.0.5)                     
#>  rstudioapi       0.11    2020-02-07 [1] CRAN (R 4.0.0)                     
#>  sessioninfo      1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                     
#>  sf             * 1.0-8   2022-03-26 [1] Github (r-spatial/sf@649b0d7)      
#>  spatstat.data  * 2.1-4   2022-03-29 [1] CRAN (R 4.0.5)                     
#>  spatstat.geom  * 2.4-0   2022-03-29 [1] CRAN (R 4.0.5)                     
#>  spatstat.utils   2.3-0   2021-12-12 [1] CRAN (R 4.0.5)                     
#>  stringi          1.4.6   2020-02-17 [1] CRAN (R 4.0.0)                     
#>  stringr          1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                     
#>  styler           1.3.2   2020-02-23 [1] CRAN (R 4.0.2)                     
#>  tibble           3.1.6   2021-11-07 [1] CRAN (R 4.0.5)                     
#>  tidyselect       1.1.0   2020-05-11 [1] CRAN (R 4.0.0)                     
#>  units            0.8-0   2022-03-04 [1] Github (r-quantities/units@92e363a)
#>  utf8             1.1.4   2018-05-24 [1] CRAN (R 4.0.0)                     
#>  vctrs            0.3.8   2021-04-29 [1] CRAN (R 4.0.5)                     
#>  withr            2.4.3   2021-11-30 [1] CRAN (R 4.0.5)                     
#>  xfun             0.29    2021-12-14 [1] CRAN (R 4.0.5)                     
#>  yaml             2.2.1   2020-02-01 [1] CRAN (R 4.0.0)                     
#> 
#> [1] C:/Users/henni/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library

I would assume (or suggest) that st_as_sf.ppplist() could do the following things: (1) Apply the respective st_as_sf() method to each individual object in the ppplist object. (2) Combine the result using rbind. Would this be a good idea?

Here is a suggested implementation (note that I don't know the spatstat classes well and therefore I can't say if there are any pitfalls I have overlooked):

# suggested solution: Why not use lapply to apply st_as_sf.x (where `x` is the respective method for the current object) and rbind.sf to combine the result?
st_as_sf.ppplist = function(x, ...) {
  ret = lapply(x, st_as_sf)
  
  # fill missing columns
  ret_column_names = unique(unlist(lapply(ret, colnames)))
  ret_missing_columns = 
    lapply(seq_along(ret), function(i) {
      i_column_names_missing = setdiff(ret_column_names, colnames(ret[[i]]))
      i_missing = lapply(i_column_names_missing, function(y) {
        y_ret = data.frame(x = rep(NA, nrow(ret[[i]])))
        colnames(y_ret) = y
        y_ret
      })
      i_missing = do.call("cbind", i_missing)
    })
  ret = lapply(seq_along(ret), function(i) {
    cbind(ret[[i]], ret_missing_columns[[i]])
  })
  
  # combine
  do.call("rbind", ret)
}

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> Registered S3 method overwritten by 'spatstat.geom':
#>   method     from
#>   print.boxx cli
#> spatstat.geom 2.4-0

# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas

# add some additional variables to check that binding marks also works when column names differ
gorillas1 <- gorillas
gorillas1$marks$a <- seq_len(nrow(gorillas1$marks))
gorillas2 <- gorillas
gorillas2$marks$b <- rev(seq_len(nrow(gorillas2$marks)))

x_sf_ppplist <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2))
colnames(x_sf_ppplist)
#> [1] "group"  "season" "date"   "a"      "label"  "b"      "geom"

Created on 2022-04-03 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.1 (2020-06-06)
#>  os       Windows 10 x64              
#>  system   i386, mingw32               
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2022-04-03                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package        * version date       lib source                             
#>  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                     
#>  backports        1.1.7   2020-05-13 [1] CRAN (R 4.0.0)                     
#>  blob             1.2.1   2020-01-20 [1] CRAN (R 4.0.0)                     
#>  class            7.3-17  2020-04-26 [2] CRAN (R 4.0.1)                     
#>  classInt         0.4-3   2020-04-07 [1] CRAN (R 4.0.0)                     
#>  cli              3.1.0   2021-10-27 [1] CRAN (R 4.0.5)                     
#>  crayon           1.4.2   2021-10-29 [1] CRAN (R 4.0.5)                     
#>  DBI              1.1.0   2019-12-15 [1] CRAN (R 4.0.0)                     
#>  deldir           1.0-6   2021-10-23 [1] CRAN (R 4.0.5)                     
#>  digest           0.6.25  2020-02-23 [1] CRAN (R 4.0.0)                     
#>  dplyr            1.0.7   2021-06-18 [1] CRAN (R 4.0.5)                     
#>  e1071            1.7-3   2019-11-26 [1] CRAN (R 4.0.0)                     
#>  ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.0.5)                     
#>  evaluate         0.14    2019-05-28 [1] CRAN (R 4.0.0)                     
#>  fansi            0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                     
#>  fs               1.4.1   2020-04-04 [1] CRAN (R 4.0.0)                     
#>  generics         0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                     
#>  glue             1.4.1   2020-05-13 [1] CRAN (R 4.0.2)                     
#>  highr            0.8     2019-03-20 [1] CRAN (R 4.0.0)                     
#>  htmltools        0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)                     
#>  KernSmooth       2.23-17 2020-04-26 [2] CRAN (R 4.0.1)                     
#>  knitr            1.37    2021-12-16 [1] CRAN (R 4.0.5)                     
#>  lattice          0.20-41 2020-04-02 [2] CRAN (R 4.0.1)                     
#>  lifecycle        1.0.1   2021-09-24 [1] CRAN (R 4.0.5)                     
#>  magrittr         1.5     2014-11-22 [1] CRAN (R 4.0.0)                     
#>  Matrix           1.2-18  2019-11-27 [2] CRAN (R 4.0.1)                     
#>  pillar           1.6.4   2021-10-18 [1] CRAN (R 4.0.5)                     
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                     
#>  polyclip         1.10-0  2019-03-14 [1] CRAN (R 4.0.0)                     
#>  purrr            0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                     
#>  R6               2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                     
#>  Rcpp             1.0.8   2022-01-13 [1] CRAN (R 4.0.5)                     
#>  reprex           2.0.1   2021-08-05 [1] CRAN (R 4.0.5)                     
#>  rlang            0.4.12  2021-10-18 [1] CRAN (R 4.0.5)                     
#>  rmarkdown        2.11    2021-09-14 [1] CRAN (R 4.0.5)                     
#>  rstudioapi       0.11    2020-02-07 [1] CRAN (R 4.0.0)                     
#>  sessioninfo      1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                     
#>  sf             * 1.0-8   2022-03-26 [1] Github (r-spatial/sf@649b0d7)      
#>  spatstat.data  * 2.1-4   2022-03-29 [1] CRAN (R 4.0.5)                     
#>  spatstat.geom  * 2.4-0   2022-03-29 [1] CRAN (R 4.0.5)                     
#>  spatstat.utils   2.3-0   2021-12-12 [1] CRAN (R 4.0.5)                     
#>  stringi          1.4.6   2020-02-17 [1] CRAN (R 4.0.0)                     
#>  stringr          1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                     
#>  styler           1.3.2   2020-02-23 [1] CRAN (R 4.0.2)                     
#>  tibble           3.1.6   2021-11-07 [1] CRAN (R 4.0.5)                     
#>  tidyselect       1.1.0   2020-05-11 [1] CRAN (R 4.0.0)                     
#>  units            0.8-0   2022-03-04 [1] Github (r-quantities/units@92e363a)
#>  utf8             1.1.4   2018-05-24 [1] CRAN (R 4.0.0)                     
#>  vctrs            0.3.8   2021-04-29 [1] CRAN (R 4.0.5)                     
#>  withr            2.4.3   2021-11-30 [1] CRAN (R 4.0.5)                     
#>  xfun             0.29    2021-12-14 [1] CRAN (R 4.0.5)                     
#>  yaml             2.2.1   2020-02-01 [1] CRAN (R 4.0.0)                     
#> 
#> [1] C:/Users/henni/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library

henningte avatar Apr 03 '22 04:04 henningte

Currently a MULTIPOINT is created for each item in the ppp list. Wouldn't a nested tibble make more sense - what if two group items have different number and/or names of marks? How to find out which group a point belonged to after simply rbind-ing the data.frame's?

edzer avatar Apr 04 '22 11:04 edzer

Ok, that is a point I have not considered.

As mentioned, I don't know the spatstat classes well. I see (based on this restricted background knowledge) that it might be sensible to nest the individual elements in a ppplist. But in my opinion, this currently creates one problem and one unintuitive behavior:

  1. The problem: marks are completely dropped with st_as_sf.ppplist().
  2. The unintuitive behavior: st_as_sf.ppp() returns a non-MULTIPOINT, but st_as_sf.ppplist() does not.

I think the problem should be solved. But I think that whether a nested tibble is returned or not is more a matter of taste (based on my restricted knowledge about spatstat classes) - and perhaps it's only me finding this unintuitive at first. I personally would prefer (but I don't imply this is the best solution) the following solution:

  • st_as_sf.ppplist() returns an unnested sf object where spatial entities are POINTs.
  • st_as_sf.ppplist() can thus easily add all marks values from the ppplist elements.
  • st_as_sf.ppplist() has an additional argument .group_id_name (just an example name) which can be a character value representing the name of a new, additional, column to append to the returned object which contains an integer value (or alternatively name or something similar) of the element index in the ppplist.
  • st_as_sf.ppplist() could have an additional argument return_nested where nesting could optionally be performed.

Or, alternatively, users could manually add a column corresponding to .group_id_name to the marks elements of the ppplist and this gets then automatically rbinded with st_as_sf.ppplist(), and could then nest the sf object manually.

Would any of this be an option? As mentioned, I don't say that the output should be nested or not, but I think the marks should be added somehow.

henningte avatar Apr 04 '22 12:04 henningte

A further question: do the ppp's in the ppplist have identical windows, and if not, what do we do with them?

As mentioned, I don't know the spatstat classes well.

Neither do I and I think that without knowing enough of them, making all such decisions is premature. It was probably a mistake to put in a st_as_sf method for ppplist as it is now. You can do

tibble::tibble(lapply(spatstat.geom::solist(a=gorillas1, b=gorillas2), st_as_sf))

or use the (tidyverse equivalents of lapply) and sort things out from there.

edzer avatar Apr 04 '22 12:04 edzer

Neither do I and I think that without knowing enough of them, making all such decisions is premature. It was probably a mistake to put in a st_as_sf method for ppplist as it is now.

How would you like to proceed then? I see two simple options:

  1. If this is low priority, we can really skip this until it gets higher priority. In this case, I'd suggest that I remove the st_as_sftime() method for ppplist objects and to wait until there is a solution.
  2. If you know someone who knows spatstat objects and/or someone how regularly converts spatstat objects to sf it could be a good idea to ask someone for his/her suggesiton and preferences.

Just to write down my ideas:

A further question: do the ppp's in the ppplist have identical windows, and if not, what do we do with them?

There are two potential solutions:

  1. One could leave it as it is currently the case. For example, st_as_sf.ppp() adds the window as separate row to the sf object and adds a new column label with value "window" for the window. If there would just be another column with the element id from a ppplist object (as suggested in my previous post), this would still be consistent. (the second manual option in my previous post would not, though)
  2. Based on my restricted knowledge (and my personal preferences), I would say that there does not have to be a direct correspondence between spatstat objects and sf objects, i.e. sf objects don't have to emulate the full behavior of spatstat objects.
    I could for example imagine that (1) st_as_sf() simply drops all window information, that (2) there's a separate function to convert windows to sf objects, and (3) that one can provide as.ppp.sf() not only an sf object representing the observed points, but also another sf object representing the window.

By the way, this reveals another inconcsistency with st_as_sf.ppplist():

  1. If the elements in the ppplist objects are not named, the resulting sf objects has values "window" for column label for all rows (i.e. also the extracted points are labeled as window even though they are not).
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> spatstat.geom 2.4-0

# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas

## label inconsistency

# when elements in the ppplist are named, column label is fine:
ppplist1_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))
ppplist1_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: 580457.9 ymin: 674172.8 xmax: 585934 ymax: 678739.2
#> CRS:           NA
#>    label                           geom
#> 1 window POLYGON ((584712 674237.1, ...
#> 2      a MULTIPOINT ((582518.4 67688...
#> 3      b MULTIPOINT ((582518.4 67688...

# ... but when not, all rows have value "window" in column label:
ppplist2_sf <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2))
ppplist2_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: 580457.9 ymin: 674172.8 xmax: 585934 ymax: 678739.2
#> CRS:           NA
#>    label                           geom
#> 1 window POLYGON ((584712 674237.1, ...
#> 2 window MULTIPOINT ((582518.4 67688...
#> 3 window MULTIPOINT ((582518.4 67688...

Created on 2022-04-04 by the reprex package (v2.0.1)

  1. Already now, st_as_sf.ppplist() extracts only the window of the first element in the ppplist object:
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> spatstat.geom 2.4-0

# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas

# convert to an sf object
ppplist1_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))

## window inconcsistency

# show current window of gorillas1
gorillas1$window
#> window: polygonal boundary
#> enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres

# change it
gorillas1$window <- spatstat.geom::owin()

# convert to sf
ppplist3_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))
identical(ppplist3_sf$geom[[1]], ppplist1_sf$geom[[1]]) # windows now is different (since it is taken from the first ppplist element)
#> [1] FALSE
identical(ppplist3_sf$geom[[1]], st_as_sf(gorillas1)$geom[[1]]) # window is identical only to the first ppplist element
#> [1] TRUE
identical(ppplist3_sf$geom[[1]], st_as_sf(gorillas2)$geom[[1]])
#> [1] FALSE

# note that there is only one window in an sf object converted from a ppplist:
ppplist3_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: 0 ymin: 0 xmax: 584945.3 ymax: 678313.5
#> CRS:           NA
#>    label                           geom
#> 1 window POLYGON ((0 0, 1 0, 1 1, 0 ...
#> 2      a MULTIPOINT ((582518.4 67688...
#> 3      b MULTIPOINT ((582518.4 67688...

Created on 2022-04-04 by the reprex package (v2.0.1)

henningte avatar Apr 04 '22 13:04 henningte

@rubak do you know whether the ppplist is something that spatstat users really use for lists of marked point patterns? If so, should list members have

  • identical marks?
  • identical window?

edzer avatar Apr 04 '22 13:04 edzer

@edzer yes marked point patterns are definitely used as members ppplist in some applications. The mark variables would typically be the same across patterns, but are not forced to be. The windows are sometimes different sub windows of the same bigger area, so they definitely don't have to be the same. I'm not sure what the best solution is here.

rubak avatar Apr 04 '22 21:04 rubak

I think we should deprecate the ppplist -> sf method, and leave this problem to the user, who will understand the problem and/or their goal better.

edzer avatar Apr 26 '22 14:04 edzer

now deprecated.

edzer avatar Mar 27 '23 11:03 edzer