vctrs
vctrs copied to clipboard
Quadratic memory consumption in `vec_unchop()` for `list_of` prototype
I was wondering why the new implementation of unnest_wider() is so slow and memory hungry for tibble columns. It turned out to be an issue with vec_unchop():
library(vctrs)
make_list_of <- function(n) {
df <- tibble::tibble(
x = new_list_of(vec_chop(1:n), ptype = integer())
)
vec_chop(df)
}
df_list1 <- make_list_of(1e3)
df_list2 <- make_list_of(2e3)
df_list4 <- make_list_of(4e3)
df_list8 <- make_list_of(8e3)
ptype <- vec_ptype(df_list1[[1]])
bench::mark(
df1 = vec_unchop(df_list1, ptype = ptype),
df2 = vec_unchop(df_list2, ptype = ptype),
df4 = vec_unchop(df_list4, ptype = ptype),
df8 = vec_unchop(df_list8, ptype = ptype),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 df1 60.76ms 64.61ms 15.3 15.4MB 21.0
#> 2 df2 149.47ms 156.88ms 6.42 61.3MB 17.7
#> 3 df4 428ms 452.35ms 2.21 244.6MB 12.2
#> 4 df8 1.32s 1.32s 0.758 977.4MB 17.4
Created on 2021-11-16 by the reprex package (v2.0.1)
I think this is mainly due to the fact that df-assign has to proxy and restore the output container at every iteration. i.e. recursive proxy/restore would really help here https://github.com/r-lib/vctrs/issues/1107
Compare against just combining list-ofs, with no data frame involved:
library(vctrs)
make_list_of <- function(n) {
new_list_of(as.list(1:n), ptype = integer())
}
df_list1 <- make_list_of(1e3)
df_list2 <- make_list_of(2e3)
df_list4 <- make_list_of(4e3)
df_list8 <- make_list_of(8e3)
ptype <- vec_ptype(df_list1[[1]])
bench::mark(
df1 = vec_unchop(df_list1, ptype = ptype),
df2 = vec_unchop(df_list2, ptype = ptype),
df4 = vec_unchop(df_list4, ptype = ptype),
df8 = vec_unchop(df_list8, ptype = ptype),
check = FALSE
)
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 df1 184.23µs 235.76µs 4172. 15.4KB 12.3
#> 2 df2 442.06µs 468.91µs 2115. 15.7KB 14.6
#> 3 df4 882.36µs 928.21µs 1071. 31.3KB 12.4
#> 4 df8 1.77ms 1.89ms 512. 62.6KB 14.9
Created on 2022-02-15 by the reprex package (v2.0.1)