vctrs
vctrs copied to clipboard
Early exit for only `TRUE` in `vec_slice()`
Useful e.g. in vec_slice(x, !is.na(x))
library(rlang)
library(vctrs)
n <- 10e6
x <- 0L + 1:n
idx <- vec_rep(TRUE, n)
bench::mark(
false_n = vec_slice(x, idx),
check = FALSE,
min_iterations = 20
)
# Before
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 false_n 29.6ms 30.6ms 28.9 76.3MB 30.3
# After
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 false_n 7.39ms 7.91ms 121. 3.73KB 0
Created on 2022-07-18 by the reprex package (v2.0.1)
I'm not quite sure we'd want to optimize for this case yet. I don't love early exits like these in general because they add one more special code path that we have to keep in mind whenever we make any changes in the future (even if the performance is nice)
I think a better pattern for your example is
if (vec_any_missing(x)) {
x <- vec_slice(x, !vec_equal_na(x))
}
Then you don't need the early exit because you only do "work" when you know there is at least 1 missing value
Closing for the reasons explained in https://github.com/r-lib/vctrs/pull/1636#issuecomment-1252300105