vctrs
vctrs copied to clipboard
Early exit for `vec_in()` and `vec_matches()` when one input is empty
Closes #579.
I think this early exit is less problematic than the one in #1591.
x <- 1:10e3 + 0L
bench::mark(
one_x = vctrs::vec_in(1L, x),
empty_x = vctrs::vec_in(integer(), x),
x_one = vctrs::vec_in(x, 1L),
x_empty = vctrs::vec_in(x, integer()),
check = FALSE
)
# Before
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 one_x 82µs 104.4µs 8941. 2.05MB 15.1
#> 2 empty_x 81.8µs 91.6µs 9809. 103.17KB 16.3
#> 3 x_one 38.1µs 43.8µs 20656. 78.23KB 32.5
#> 4 x_empty 31.6µs 36.4µs 24931. 78.23KB 34.5
# After
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 one_x 79.59µs 103.57µs 7803. 2.05MB 14.2
#> 2 empty_x 1.98µs 2.23µs 342233. 0B 0
#> 3 x_one 38.63µs 45.63µs 18206. 78.23KB 24.9
#> 4 x_empty 4.76µs 7.63µs 90908. 39.11KB 63.7
bench::mark(
one_x = vctrs::vec_match(1L, x),
empty_x = vctrs::vec_match(integer(), x),
x_one = vctrs::vec_match(x, 1L),
x_empty = vctrs::vec_match(x, integer()),
check = FALSE
)
# Before
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 one_x 81µs 89.1µs 10280. 106.8KB 21.9
#> 2 empty_x 81.2µs 88.8µs 10510. 103.2KB 18.4
#> 3 x_one 34.8µs 39.8µs 23007. 78.2KB 36.2
#> 4 x_empty 29.4µs 34.4µs 26560. 78.2KB 38.6
# After
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 one_x 81.26µs 90.84µs 9594. 106.8KB 21.1
#> 2 empty_x 2.02µs 2.27µs 310860. 0B 0
#> 3 x_one 34.77µs 40.86µs 19220. 78.2KB 26.5
#> 4 x_empty 4.88µs 7.65µs 87792. 39.1KB 70.3
Created on 2022-08-31 with reprex v2.0.2
In principle I'm not sure early exits are worth it when it isn't immediately clear that they are safe/correct, since they only speed up the empty case.
Hi @mgirlich. Thanks for looking into this and the other 2 early exit PRs. However we are a bit uncomfortable with implementing these early exits as they make the code harder to reason about and require more unit tests to make sure the different code paths do not diverge.
It's partly my fault for having opened the issue #579. Sorry for closing this and the other PRs and thanks again for the effort put into these PRs.
Don't worry. I learned something while looking at the vctrs code and I also learned about the issues with early exits. It might be nice to make other developers aware that there are no early exits and patterns to use instead.
Just want to second Lionel that we really appreciate your efforts in trying to improve vctrs!