vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

Early exit for `vec_in()` and `vec_matches()` when one input is empty

Open mgirlich opened this issue 1 year ago • 1 comments

Closes #579.

I think this early exit is less problematic than the one in #1591.

x <- 1:10e3 + 0L

bench::mark(
  one_x = vctrs::vec_in(1L, x),
  empty_x = vctrs::vec_in(integer(), x),
  x_one = vctrs::vec_in(x, 1L),
  x_empty = vctrs::vec_in(x, integer()),
  check = FALSE
)

# Before
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x          82µs  104.4µs     8941.    2.05MB     15.1
#> 2 empty_x      81.8µs   91.6µs     9809.  103.17KB     16.3
#> 3 x_one        38.1µs   43.8µs    20656.   78.23KB     32.5
#> 4 x_empty      31.6µs   36.4µs    24931.   78.23KB     34.5

# After
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x       79.59µs 103.57µs     7803.    2.05MB     14.2
#> 2 empty_x      1.98µs   2.23µs   342233.        0B      0  
#> 3 x_one       38.63µs  45.63µs    18206.   78.23KB     24.9
#> 4 x_empty      4.76µs   7.63µs    90908.   39.11KB     63.7

bench::mark(
  one_x = vctrs::vec_match(1L, x),
  empty_x = vctrs::vec_match(integer(), x),
  x_one = vctrs::vec_match(x, 1L),
  x_empty = vctrs::vec_match(x, integer()),
  check = FALSE
)

# Before
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x          81µs   89.1µs    10280.   106.8KB     21.9
#> 2 empty_x      81.2µs   88.8µs    10510.   103.2KB     18.4
#> 3 x_one        34.8µs   39.8µs    23007.    78.2KB     36.2
#> 4 x_empty      29.4µs   34.4µs    26560.    78.2KB     38.6

# After
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x       81.26µs  90.84µs     9594.   106.8KB     21.1
#> 2 empty_x      2.02µs   2.27µs   310860.        0B      0  
#> 3 x_one       34.77µs  40.86µs    19220.    78.2KB     26.5
#> 4 x_empty      4.88µs   7.65µs    87792.    39.1KB     70.3

Created on 2022-08-31 with reprex v2.0.2

mgirlich avatar Aug 31 '22 11:08 mgirlich

In principle I'm not sure early exits are worth it when it isn't immediately clear that they are safe/correct, since they only speed up the empty case.

lionel- avatar Sep 01 '22 14:09 lionel-

Hi @mgirlich. Thanks for looking into this and the other 2 early exit PRs. However we are a bit uncomfortable with implementing these early exits as they make the code harder to reason about and require more unit tests to make sure the different code paths do not diverge.

It's partly my fault for having opened the issue #579. Sorry for closing this and the other PRs and thanks again for the effort put into these PRs.

lionel- avatar Sep 20 '22 12:09 lionel-

Don't worry. I learned something while looking at the vctrs code and I also learned about the issues with early exits. It might be nice to make other developers aware that there are no early exits and patterns to use instead.

mgirlich avatar Sep 20 '22 13:09 mgirlich

Just want to second Lionel that we really appreciate your efforts in trying to improve vctrs!

DavisVaughan avatar Sep 20 '22 14:09 DavisVaughan