vctrs
vctrs copied to clipboard
Performance issues `vec_rbind()`?
When binding many 1 row tibbles vec_c()
is 20% to 40% faster than vec_rbind()
. I would have expected vec_rbind()
to be faster as this seems to be the main purpose of it.
library(vctrs)
row_list1 <- vec_rep(vec_chop(mtcars), 1e3)
row_list10 <- vec_rep(vec_chop(mtcars), 10e3)
ptype <- vec_ptype(row_list1[[1]])
bench::mark(
vec_c1 = vec_c(!!!row_list1, .ptype = ptype),
vec_rbind1 = vec_rbind(!!!row_list1, .ptype = ptype),
check = TRUE,
iterations = 3
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_c1 151ms 217ms 4.70 8.47MB 6.26
#> 2 vec_rbind1 161ms 208ms 4.74 7.49MB 7.90
bench::mark(
vec_c10 = vec_c(!!!row_list10, .ptype = ptype),
vec_rbind10 = vec_rbind(!!!row_list10, .ptype = ptype),
check = TRUE,
iterations = 3
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_c10 1.81s 2.04s 0.507 87.7MB 1.01
#> 2 vec_rbind10 2.65s 2.72s 0.364 71.8MB 1.34
Created on 2021-10-09 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.0 (2021-05-18)
#> os macOS Big Sur 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
#> date 2021-10-09
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> bench 1.1.1 2020-01-13 [1] CRAN (R 4.1.0)
#> cli 3.0.1.9000 2021-10-07 [1] Github (r-lib/cli@2808311)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
#> knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> pillar 1.6.3 2021-09-26 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> profmem 0.6.0 2020-12-13 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 0.99.0.9000 2021-10-09 [1] Github (r-lib/rlang@d0dee64)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
#> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.0)
#> stringr 1.4.0.9000 2021-08-23 [1] Github (tidyverse/stringr@6670a37)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.0)
#> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs * 0.3.8.9001 2021-10-09 [1] Github (r-lib/vctrs@199da1a)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.26 2021-09-14 [1] CRAN (R 4.1.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
Could this have to do with how names are handled?
For my own use case, I have many one-row tibbles, and I would like to call vec_rbind()
internally in a package (c.f. https://github.com/wlandau/crew/discussions/123). The package makes sure all the names are already consistent and correct, so I do not need any name checking or name repair. On my machine, the fastest supported name repair option is responsible for 50-60% of the execution time. It would be great to be able to disable name processing completely and cut out the overhead.
packageVersion("data.table")
#> [1] ‘1.14.8’
packageVersion("vctrs")
#> [1] ‘0.6.3’
result <- crew:::monad_tibble(crew::crew_eval(12))
list <- replicate(1e6, result, simplify = FALSE)
system.time(data.table::rbindlist(list, use.names = FALSE))
#> user system elapsed
#> 0.924 0.014 0.940
system.time(vctrs::vec_rbind(list, .name_repair = "universal_quiet"))
#> user system elapsed
#> 1.338 0.061 1.400
proffer::pprof(vctrs::vec_rbind(list, .name_repair = "universal_quiet"))