sf
sf copied to clipboard
`st_as_sf.ppplist()` does not add `marks` columns from `spatstat` objects and extracts only the window of the first `ppplist` element
st_as_sf.ppplist()
does not add marks
columns from the spatstat
objects in the ppplist
, i.e. if the objects in the ppplist
have any additional variables in the marks
element, these will be dropped.
I don't know if this behavior is expected given that the conversion methods to sf
from individual spatstat
classes (e.g. ppp
, psp
) add the variables from the marks
element.
Here is an example:
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> Registered S3 method overwritten by 'spatstat.geom':
#> method from
#> print.boxx cli
#> spatstat.geom 2.4-0
# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas
# show mark colnames
colnames(gorillas1$marks)
#> [1] "group" "season" "date"
# convert to sf: problem: marks are dropped
x_sf_ppplist <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2))
colnames(x_sf_ppplist)
#> [1] "label" "geom"
# this contrasts the behavior when only one ppp object is converted to sf (marks are added as columns)
x_sf_ppp <- st_as_sf(gorillas1)
colnames(x_sf_ppp)
#> [1] "group" "season" "date" "label" "geom"
Created on 2022-04-03 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.1 (2020-06-06)
#> os Windows 10 x64
#> system i386, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2022-04-03
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0)
#> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> class 7.3-17 2020-04-26 [2] CRAN (R 4.0.1)
#> classInt 0.4-3 2020-04-07 [1] CRAN (R 4.0.0)
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.0.5)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.0.5)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
#> deldir 1.0-6 2021-10-23 [1] CRAN (R 4.0.5)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.0.5)
#> e1071 1.7-3 2019-11-26 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#> KernSmooth 2.23-17 2020-04-26 [2] CRAN (R 4.0.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.0.5)
#> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.1)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.1)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.0.5)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.0.5)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.5)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.0.5)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.5)
#> rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> sf * 1.0-8 2022-03-26 [1] Github (r-spatial/sf@649b0d7)
#> spatstat.data * 2.1-4 2022-03-29 [1] CRAN (R 4.0.5)
#> spatstat.geom * 2.4-0 2022-03-29 [1] CRAN (R 4.0.5)
#> spatstat.utils 2.3-0 2021-12-12 [1] CRAN (R 4.0.5)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.2)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.0.5)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> units 0.8-0 2022-03-04 [1] Github (r-quantities/units@92e363a)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.0.5)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.0.5)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/henni/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library
I would assume (or suggest) that st_as_sf.ppplist()
could do the following things: (1) Apply the respective st_as_sf()
method to each individual object in the ppplist
object. (2) Combine the result using rbind
. Would this be a good idea?
Here is a suggested implementation (note that I don't know the spatstat
classes well and therefore I can't say if there are any pitfalls I have overlooked):
# suggested solution: Why not use lapply to apply st_as_sf.x (where `x` is the respective method for the current object) and rbind.sf to combine the result?
st_as_sf.ppplist = function(x, ...) {
ret = lapply(x, st_as_sf)
# fill missing columns
ret_column_names = unique(unlist(lapply(ret, colnames)))
ret_missing_columns =
lapply(seq_along(ret), function(i) {
i_column_names_missing = setdiff(ret_column_names, colnames(ret[[i]]))
i_missing = lapply(i_column_names_missing, function(y) {
y_ret = data.frame(x = rep(NA, nrow(ret[[i]])))
colnames(y_ret) = y
y_ret
})
i_missing = do.call("cbind", i_missing)
})
ret = lapply(seq_along(ret), function(i) {
cbind(ret[[i]], ret_missing_columns[[i]])
})
# combine
do.call("rbind", ret)
}
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> Registered S3 method overwritten by 'spatstat.geom':
#> method from
#> print.boxx cli
#> spatstat.geom 2.4-0
# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas
# add some additional variables to check that binding marks also works when column names differ
gorillas1 <- gorillas
gorillas1$marks$a <- seq_len(nrow(gorillas1$marks))
gorillas2 <- gorillas
gorillas2$marks$b <- rev(seq_len(nrow(gorillas2$marks)))
x_sf_ppplist <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2))
colnames(x_sf_ppplist)
#> [1] "group" "season" "date" "a" "label" "b" "geom"
Created on 2022-04-03 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.1 (2020-06-06)
#> os Windows 10 x64
#> system i386, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2022-04-03
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0)
#> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> class 7.3-17 2020-04-26 [2] CRAN (R 4.0.1)
#> classInt 0.4-3 2020-04-07 [1] CRAN (R 4.0.0)
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.0.5)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.0.5)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
#> deldir 1.0-6 2021-10-23 [1] CRAN (R 4.0.5)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.0.5)
#> e1071 1.7-3 2019-11-26 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#> KernSmooth 2.23-17 2020-04-26 [2] CRAN (R 4.0.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.0.5)
#> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.1)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.1)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.0.5)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.0.5)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.5)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.0.5)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.5)
#> rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> sf * 1.0-8 2022-03-26 [1] Github (r-spatial/sf@649b0d7)
#> spatstat.data * 2.1-4 2022-03-29 [1] CRAN (R 4.0.5)
#> spatstat.geom * 2.4-0 2022-03-29 [1] CRAN (R 4.0.5)
#> spatstat.utils 2.3-0 2021-12-12 [1] CRAN (R 4.0.5)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.2)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.0.5)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> units 0.8-0 2022-03-04 [1] Github (r-quantities/units@92e363a)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.0.5)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.0.5)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/henni/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.1/library
Currently a MULTIPOINT
is created for each item in the ppp list. Wouldn't a nested tibble make more sense - what if two group items have different number and/or names of marks? How to find out which group a point belonged to after simply rbind
-ing the data.frame's?
Ok, that is a point I have not considered.
As mentioned, I don't know the spatstat
classes well. I see (based on this restricted background knowledge) that it might be sensible to nest the individual elements in a ppplist
. But in my opinion, this currently creates one problem and one unintuitive behavior:
- The problem:
marks
are completely dropped withst_as_sf.ppplist()
. - The unintuitive behavior:
st_as_sf.ppp()
returns a non-MULTIPOINT
, butst_as_sf.ppplist()
does not.
I think the problem should be solved. But I think that whether a nested tibble
is returned or not is more a matter of taste (based on my restricted knowledge about spatstat
classes) - and perhaps it's only me finding this unintuitive at first. I personally would prefer (but I don't imply this is the best solution) the following solution:
-
st_as_sf.ppplist()
returns an unnestedsf
object where spatial entities arePOINT
s. -
st_as_sf.ppplist()
can thus easily add allmarks
values from theppplist
elements. -
st_as_sf.ppplist()
has an additional argument.group_id_name
(just an example name) which can be a character value representing the name of a new, additional, column to append to the returned object which contains an integer value (or alternatively name or something similar) of the element index in theppplist
. -
st_as_sf.ppplist()
could have an additional argumentreturn_nested
where nesting could optionally be performed.
Or, alternatively, users could manually add a column corresponding to .group_id_name
to the marks
elements of the ppplist
and this gets then automatically rbind
ed with st_as_sf.ppplist()
, and could then nest the sf
object manually.
Would any of this be an option? As mentioned, I don't say that the output should be nested or not, but I think the marks
should be added somehow.
A further question: do the ppp's in the ppplist have identical windows, and if not, what do we do with them?
As mentioned, I don't know the spatstat classes well.
Neither do I and I think that without knowing enough of them, making all such decisions is premature. It was probably a mistake to put in a st_as_sf
method for ppplist
as it is now. You can do
tibble::tibble(lapply(spatstat.geom::solist(a=gorillas1, b=gorillas2), st_as_sf))
or use the (tidyverse
equivalents of lapply
) and sort things out from there.
Neither do I and I think that without knowing enough of them, making all such decisions is premature. It was probably a mistake to put in a
st_as_sf
method forppplist
as it is now.
How would you like to proceed then? I see two simple options:
- If this is low priority, we can really skip this until it gets higher priority. In this case, I'd suggest that I remove the
st_as_sftime()
method forppplist
objects and to wait until there is a solution. - If you know someone who knows
spatstat
objects and/or someone how regularly convertsspatstat
objects tosf
it could be a good idea to ask someone for his/her suggesiton and preferences.
Just to write down my ideas:
A further question: do the ppp's in the ppplist have identical windows, and if not, what do we do with them?
There are two potential solutions:
- One could leave it as it is currently the case. For example,
st_as_sf.ppp()
adds the window as separate row to thesf
object and adds a new columnlabel
with value"window"
for the window. If there would just be another column with the element id from appplist
object (as suggested in my previous post), this would still be consistent. (the second manual option in my previous post would not, though) - Based on my restricted knowledge (and my personal preferences), I would say that there does not have to be a direct correspondence between
spatstat
objects andsf
objects, i.e.sf
objects don't have to emulate the full behavior ofspatstat
objects.
I could for example imagine that (1)st_as_sf()
simply drops all window information, that (2) there's a separate function to convert windows tosf
objects, and (3) that one can provideas.ppp.sf()
not only ansf
object representing the observed points, but also anothersf
object representing the window.
By the way, this reveals another inconcsistency with st_as_sf.ppplist()
:
- If the elements in the
ppplist
objects are not named, the resultingsf
objects has values"window"
for columnlabel
for all rows (i.e. also the extracted points are labeled as window even though they are not).
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> spatstat.geom 2.4-0
# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas
## label inconsistency
# when elements in the ppplist are named, column label is fine:
ppplist1_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))
ppplist1_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 580457.9 ymin: 674172.8 xmax: 585934 ymax: 678739.2
#> CRS: NA
#> label geom
#> 1 window POLYGON ((584712 674237.1, ...
#> 2 a MULTIPOINT ((582518.4 67688...
#> 3 b MULTIPOINT ((582518.4 67688...
# ... but when not, all rows have value "window" in column label:
ppplist2_sf <- st_as_sf(spatstat.geom::solist(gorillas1, gorillas2))
ppplist2_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 580457.9 ymin: 674172.8 xmax: 585934 ymax: 678739.2
#> CRS: NA
#> label geom
#> 1 window POLYGON ((584712 674237.1, ...
#> 2 window MULTIPOINT ((582518.4 67688...
#> 3 window MULTIPOINT ((582518.4 67688...
Created on 2022-04-04 by the reprex package (v2.0.1)
- Already now,
st_as_sf.ppplist()
extracts only the window of the first element in theppplist
object:
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(spatstat.geom)
#> Warning: package 'spatstat.geom' was built under R version 4.0.5
#> Loading required package: spatstat.data
#> Warning: package 'spatstat.data' was built under R version 4.0.5
#> spatstat.geom 2.4-0
# get some ppp objects to combine in a list
gorillas1 <- gorillas
gorillas2 <- gorillas
# convert to an sf object
ppplist1_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))
## window inconcsistency
# show current window of gorillas1
gorillas1$window
#> window: polygonal boundary
#> enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres
# change it
gorillas1$window <- spatstat.geom::owin()
# convert to sf
ppplist3_sf <- st_as_sf(spatstat.geom::solist(a=gorillas1, b=gorillas2))
identical(ppplist3_sf$geom[[1]], ppplist1_sf$geom[[1]]) # windows now is different (since it is taken from the first ppplist element)
#> [1] FALSE
identical(ppplist3_sf$geom[[1]], st_as_sf(gorillas1)$geom[[1]]) # window is identical only to the first ppplist element
#> [1] TRUE
identical(ppplist3_sf$geom[[1]], st_as_sf(gorillas2)$geom[[1]])
#> [1] FALSE
# note that there is only one window in an sf object converted from a ppplist:
ppplist3_sf
#> Simple feature collection with 3 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: 0 ymin: 0 xmax: 584945.3 ymax: 678313.5
#> CRS: NA
#> label geom
#> 1 window POLYGON ((0 0, 1 0, 1 1, 0 ...
#> 2 a MULTIPOINT ((582518.4 67688...
#> 3 b MULTIPOINT ((582518.4 67688...
Created on 2022-04-04 by the reprex package (v2.0.1)
@rubak do you know whether the ppplist
is something that spatstat
users really use for lists of marked point patterns? If so, should list members have
- identical marks?
- identical window?
@edzer yes marked point patterns are definitely used as members ppplist
in some applications. The mark variables would typically be the same across patterns, but are not forced to be. The windows are sometimes different sub windows of the same bigger area, so they definitely don't have to be the same. I'm not sure what the best solution is here.
I think we should deprecate the ppplist
-> sf
method, and leave this problem to the user, who will understand the problem and/or their goal better.
now deprecated.