vctrs
vctrs copied to clipboard
`vec_c` and class `pseries` from pkg `plm`
I noticed a strange issue if dplyr
is loaded while I implemented a subsetting method for class pseries
in pkg plm
. I think I boild it down to vctrs::vec_c
hence I post here (I came from this line in dplyr
but it seems irrelevant for the topic: https://github.com/tidyverse/dplyr/blob/04454209ea069939d3335c43846c85c725547a89/R/lead-lag.R#L72; the issue is triggered due to dplyr
clobbering baser R's lag, see https://github.com/tidyverse/dplyr/issues/1586, https://github.com/tidyverse/dplyr/issues/2195 - but that is not my point here).
A pseries
is built on top of vectors and factor, the class attribute is c("pseries", "<basic_class>")
where <basic_class>
= numeric, integer, ..., factor. A pseries features an index attribute of the same length in rows (a data.frame with two factors and additional class c("pindex", "data.frame")
, which needs to be subset in the same manner.
pseries
have been around for long without a subsetting method, so subsetting dispatched to base R subsetting for vectors/factor, thus removing all pseries features. I aim to implement a pseries
subsetting method to make the class "more complete" in the sense that subsetting a pseries results in a pseries.
vec_c
does a lot of calls to [.pseries
which seems strange to me, as do the inputs fed to [.pseries
. Also, the result is somewhat a mixture of the various subsetting calls performed, where the index attribute seems to be the part that is a result of one subsetting step (subset by integer()
) but it does not fit to the numeric part returned which seems to be taken from another call to [.pseries
(subset by 1:3
for a 3-entry vector as in the example below).
Here is a reproducible example with dev version 2.4-1.99999 of plm rev 1312 and a reduced and debugging enabled [.pseries
method hooked in:
library(plm) # 2.4-1.99999 / rev. 1312 as provided in link above
data("Grunfeld", package = "plm")
pGrunfeld <- pdata.frame(Grunfeld)
pser_num <- pGrunfeld$inv # class is c("pseries", "numeric")
`[.pseries` <- function(x, ...) {
## not fully sane, reduced to illustrate
# debug printing:
print("[.pseries executed with input:")
cat("\n")
print("x = ")
print(x)
dots <- list(...)
cat("\n")
print("ellipsis: ")
print(dots)
cat("\n")
# save index, to be subset and attached later on
ix <- attr(x, "index")
# handles names, also to identify rows of be subet for index
names_orig <- names(x)
keep_ix_rownr <- seq_along(x) # full length row numbers original pseries
names(keep_ix_rownr) <- names_orig
if(is.null(names_orig)) {
# if no names are present, set names as integer sequence to identify
# rows to keep in index later
names(x) <- keep_ix_rownr
names(keep_ix_rownr) <- keep_ix_rownr
}
# remove pseries features to dispatch to base R subsetting
attr(x, "index") <- NULL
class(x) <- setdiff(class(x), "pseries")
result <- x[...] # actual subsetting
keep_ix_rownr <- keep_ix_rownr[names(result)]
if(is.null(names_orig)) names(result) <- NULL # if not names were present, null names in result
# Subset index accordingly:
ix <- ix[keep_ix_rownr, ]
ix <- droplevels(ix)
# restore pseries features: class and subset index
class(result) <- c("pseries", class(result))
attr(result, "index") <- ix
return(result)
}
# hook in [.pseries, overwriting the one originally in the dev version of the package
assignInNamespace("[.pseries", `[.pseries`, envir = as.environment("package:plm"))
pser_num <- pser_num[1:3] # make short to ease reading
pser_num_vec_c1 <- vctrs::vec_c(pser_num) # [.pseries executed 6x, strange inputs
pser_num_vec_c2 <- vctrs::vec_c(NA, pser_num) # [.pseries executed even 14x
str(pser_num_vec_c1) # attr. index present but is destroyed (0-row data.frame)
##### str output (stripped)
##### the 0-row data.frame seems to result from a subsetting by integer()
## 'pseries' Named num [1:3] 318 392 411
## - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 0 obs. of 2 variables:
## ..$ firm: Factor w/ 0 levels:
## ..$ year: Factor w/ 0 levels:
## - attr(*, "names")= chr [1:3] "1-1935" "1-1936" "1-1937"
Any ideas?
Another thing I noticed is that vec_c
seems to strict.
### This seems too strict:
vctrs::vec_c(1.1, pser_num)
# Error: Can't combine `..1` <double> and `..2` <pseries>.
### ... because:
class(pser_num) # c("pseries", "numeric")
typeof(pser_num) # double
Sessioninfo:
> devtools::session_info()
- Session info -----------------------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.1.1 (2021-08-10)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate German_Germany.1252
ctype German_Germany.1252
tz Europe/Berlin
date 2021-09-05
- Packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.0)
bdsmatrix 1.3-4 2020-01-13 [1] CRAN (R 4.1.0)
boot 1.3-28 2021-05-03 [2] CRAN (R 4.1.1)
broom 0.7.9 2021-07-27 [1] CRAN (R 4.1.0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.1)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.1.0)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
cluster 2.1.2 2021-04-17 [2] CRAN (R 4.1.1)
collapse 1.6.5 2021-07-24 [1] CRAN (R 4.1.0)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.0)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dreamerr 1.2.3 2020-12-05 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
fixest 0.10.0 2021-08-31 [1] Github (lrberge/fixest@9cdd106)
foreign 0.8-81 2020-12-22 [2] CRAN (R 4.1.1)
Formula 1.2-4 2020-10-16 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
gdata 2.18.0 2017-06-06 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
gtools 3.9.2 2021-06-06 [1] CRAN (R 4.1.0)
Hmisc 4.5-0 2021-02-28 [1] CRAN (R 4.1.0)
htmlTable 2.2.1 2021-05-18 [1] CRAN (R 4.1.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.1.0)
jpeg 0.1-9 2021-07-24 [1] CRAN (R 4.1.0)
knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
lattice 0.20-44 2021-05-02 [2] CRAN (R 4.1.1)
latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.1.0)
lfe 2.8-7 2021-07-31 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
lme4 1.1-27.1 2021-06-22 [1] CRAN (R 4.1.0)
lmtest 0.9-38 2020-09-09 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
MASS 7.3-54 2021-05-03 [2] CRAN (R 4.1.1)
Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.1)
maxLik 1.5-2 2021-07-26 [1] CRAN (R 4.1.0)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
mice 3.13.0 2021-01-27 [1] CRAN (R 4.1.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 4.1.0)
miscTools 0.6-26 2019-12-08 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
nlme 3.1-152 2021-02-04 [2] CRAN (R 4.1.1)
nloptr 1.2.2.2 2020-07-02 [1] CRAN (R 4.1.0)
nnet 7.3-16 2021-05-03 [2] CRAN (R 4.1.1)
numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.1.0)
pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
plm * 2.4-1.99999 2021-09-04 [1] R-Forge (R 4.1.1)
png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
rbibutils 2.2.3 2021-08-09 [1] CRAN (R 4.1.1)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RcppArmadillo 0.10.6.0.0 2021-07-16 [1] CRAN (R 4.1.0)
RcppEigen 0.3.3.9.1 2020-12-17 [1] CRAN (R 4.1.0)
Rdpack 2.1.2 2021-06-01 [1] CRAN (R 4.1.0)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.1.0)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.1)
rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.0)
rpart 4.1-15 2019-04-12 [2] CRAN (R 4.1.1)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
rsconnect 0.8.24 2021-08-05 [1] CRAN (R 4.1.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
sandwich 3.0-1 2021-05-18 [1] CRAN (R 4.1.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
stringi 1.7.4 2021-08-25 [1] CRAN (R 4.1.1)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
survival 3.2-11 2021-04-26 [2] CRAN (R 4.1.1)
testthat 3.0.4 2021-07-01 [1] CRAN (R 4.1.0)
tibble 3.1.4 2021-08-25 [1] CRAN (R 4.1.1)
tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
usethis 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1)
weights 1.0.4 2021-06-10 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xfun 0.25 2021-08-06 [1] CRAN (R 4.1.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
zoo 1.8-9 2021-03-09 [1] CRAN (R 4.1.0)
With more complex classes like pseries, which have an attribute that must be sliced alongside the core data, package authors generally have to do a little more work to get their class to work correctly with vctrs / the tidyverse.
A lot of this information is in our vignettes: https://vctrs.r-lib.org/articles/s3-vector.html https://vctrs.r-lib.org/articles/type-size.html
vec_slice()
is probably a simpler place to start than vec_c()
, where you can see that your [
method is being called correctly:
out <- vctrs::vec_slice(pser_num, c(1, 3))
#> [1] "[.pseries executed with input:"
#>
#> [1] "x = "
#> 1-1935 1-1936 1-1937
#> 317.6 391.8 410.6
#>
#> [1] "ellipsis: "
#> $i
#> [1] 1 3
out
#> 1-1935 1-1937
#> 317.6 410.6
attributes(out)$index
#> firm year
#> 1 1 1935
#> 3 1 1937
vec_c()
is more complicated. Essentially we get the common type of the inputs, construct an output container based on that common type that has the right length, fill in the data, and then add on any attributes that came with the common type.
To get the common type, we take 0-length slices of each input, which is why you are seeing [
being called a few times. With S3 classes that we don't know much about, this is our fallback to obtain a prototype (or ptype) for that input. You can see the ptype with vec_ptype()
# it does retain the pseries class even though it says "numeric(0)"
vctrs::vec_ptype(pser_num)
#> named numeric(0)
attributes(vctrs::vec_ptype(pser_num))$index
#> [1] firm year
#> <0 rows> (or 0-length row.names)
When vec_c()
has 1 input, this is the common type, so then we build up an output container from this using vec_init()
out <- vctrs::vec_init(vctrs::vec_ptype(pser_num), 5)
out
#> <NA> <NA> <NA> <NA> <NA>
#> NA NA NA NA NA
attributes(out)$index
#> firm year
#> NA <NA> <NA>
#> NA.1 <NA> <NA>
#> NA.2 <NA> <NA>
#> NA.3 <NA> <NA>
#> NA.4 <NA> <NA>
Before filling up this output container, we "proxy" it and all of the inputs. Proxying generates an alternative representation of the container that contains basic atomic R types that are easily fillable at the C level. After filling, we finalize the result by "restoring" the proxy back to the original type.
By default, the proxy doesn't do anything for S3 classes we don't know about, but the restore method will copy over the attributes of the original prototype before it was proxied (because they often are static and don't depend on length).
This restore bit is where the issue is for pseries, since it doesn't know not to copy over the index from the original type. We end up copying over the index from the prototype, which has 0 rows.
ptype <- vctrs::vec_ptype(pser_num)
out <- vctrs::vec_init(ptype, 5)
out <- vctrs::vec_proxy(out)
out
#> <NA> <NA> <NA> <NA> <NA>
#> NA NA NA NA NA
attributes(out)$index
#> firm year
#> NA <NA> <NA>
#> NA.1 <NA> <NA>
#> NA.2 <NA> <NA>
#> NA.3 <NA> <NA>
#> NA.4 <NA> <NA>
# do the filling of `vec_c()` here
# now restore, copying over `ptype` attributes to `out`
out <- vctrs::vec_restore(out, ptype)
# this would normally have the data from the filling of `vec_c()`
out
#> <NA> <NA> <NA> <NA> <NA>
#> NA NA NA NA NA
# oh no, 0 row attribute
attributes(out)$index
#> [1] firm year
#> <0 rows> (or 0-length row.names)
Since pseries has an attribute that relies on the length and ordering of the input, we'd generally advise creating a vec_proxy()
and vec_restore()
method to customize these two steps of the process. The proxy could be a two column data frame, where the first column holds the data and the second column holds the index data frame. That way they get sliced and combined together and you don't have to manage them separately. The restoration method would just move the index column back as an attribute.
vec_proxy.pseries <- function(x, ...) {
x <- unclass(x)
index <- attr(x, "index", exact = TRUE)
attr(x, "index") <- NULL
vctrs::data_frame(x = x, index = index)
}
vec_restore.pseries <- function(x, to, ...) {
index <- x$index
x <- x$x
attr(x, "index") <- index
class(x) <- c("pseries", class(x))
x
}
ptype <- vctrs::vec_ptype(pser_num)
# notice the proxy is a data frame now
out <- vctrs::vec_init(vctrs::vec_ptype(pser_num), 5)
out <- vctrs::vec_proxy(out)
out
#> x index.firm index.year
#> 1 NA <NA> <NA>
#> 2 NA <NA> <NA>
#> 3 NA <NA> <NA>
#> 4 NA <NA> <NA>
#> 5 NA <NA> <NA>
# do the filling of `vec_c()` here
# now restore from data frame back to pseries
out <- vctrs::vec_restore(out, ptype)
out
#>
#> NA NA NA NA NA
attributes(out)$index
#> firm year
#> ...1 <NA> <NA>
#> ...2 <NA> <NA>
#> ...3 <NA> <NA>
#> ...4 <NA> <NA>
#> ...5 <NA> <NA>
With a proxy and restore method in place, vec_c()
would work properly (the odd row names are a technical detail and could be cleaned up)
vctrs::vec_c(pser_num, pser_num)
#> 1-1935 1-1936 1-1937 1-1935 1-1936 1-1937
#> 317.6 391.8 410.6 317.6 391.8 410.6
attributes(vctrs::vec_c(pser_num, pser_num))$index
#> firm year
#> 1...1 1 1935
#> 2...2 1 1936
#> 3...3 1 1937
#> 1...4 1 1935
#> 2...5 1 1936
#> 3...6 1 1937
Thank you for the in-depth explanation, very instructive! I read a bit in the vignettes before posting. I reckon one's package would need to hard-depend on vctrs
to implement all this? Dependign on yet another package is what one would typically avoid. I could imagine to work around the dependency by suppyling an own generic in the package but I am not sure if the double-dispatching mechanism would work.
I would not expect vec_c
to return the "correct" result for a pseries
. Intuitvely I assumed vec_c
would fall-back to base c
if an unknown class or a base R class is encountered (in the first or a later entry in the class
attribute, as in c("pseries", "numeric")
. That is also my reading of ?vec_c
: If inputs inherit from a common class hierarchy, vec_c() falls back to base::c() if there exists a c() method implemented for this class hierarchy.class hierarchy.
Wouldn't mimicing base R behaviour be what most would assume (as we are used to lose attributes etc)?
Intuitvely I assumed vec_c would fall-back to base c if an unknown class or a base R class is encountered (in the first or a later entry in the class attribute, as in c("pseries", "numeric"). That is also my reading of ?vec_c
yup but there is no c()
method for pseries
. So the common type methods must be implemented.
vctrs:::s3_get_method("pseries", "c")
#> NULL