vctrs
vctrs copied to clipboard
C implementations of `vec_case_when()` and `vec_case_match()`
And possibly vec_if_else() because it would be nice for, say, ggplot2 to be able to use this.
Consider if we can figure out some kind of 1:1 interface that doesn't always require a list for values and haystacks to nicely replace plyr::mapvalues() https://github.com/tidyverse/dplyr/issues/7027 (the list approach is very powerful and general because it allows for 1:m and m:1 replacements, but is not always needed)
Is it still planned? I saw this was the proposed solution to replacing splicing for dplyr::recode(). Using recode() is slowing down code because of lifecycle, so I wondered if I could rely on a faster vctrs implementation sometimes in the future..
Cf. https://github.com/tidyverse/dplyr/issues/6623#issuecomment-1362887413
The bench marks for recreating the formula can be a bit expansive
# manually created
a_formula <- c("xx" ~ "x", "y" ~ "yy")
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula,
.default = NA_character_
)
#> [1] NA "yy" NA
# What I have
a_list <- c("xx" = "x", "y" = "yy")
dplyr::recode(
c("x", "y", "z"),
!!!a_list,
.default = NA_character_
)
#> [1] NA "yy" NA
# programatically recreated
a_formula_from_list <- purrr::map2(
names(a_list),
unname(a_list),
rlang::new_formula
)
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula_from_list,
.default = NA_character_
)
#> [1] NA "yy" NA
bench::mark(
recode = dplyr::recode(
c("x", "y", "z"),
!!!a_list,
.default = NA_character_
),
casematch_program = {
a_formula_from_list <- purrr::map2(
names(a_list),
unname(a_list),
rlang::new_formula
)
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula_from_list,
.default = NA_character_
)
},
casematch_regular = dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula,
.default = NA_character_
)
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 recode 802µs 888µs 988. 0B 8.43
#> 2 casematch_program 366µs 385µs 2355. 1.3KB 10.4
#> 3 casematch_regular 295µs 311µs 2860. 1.05KB 10.4
# programatically recreating the values can become expansive
Created on 2024-05-07 with reprex v2.1.0
Unfortunately it is taking us longer than expected to get some time for a vctrs release, but this is definitely still something I want to add, as I think a lot of people would like a low level type stable vec_if_else() that doesn't need dplyr (particularly ggplot2)
In fact @DavisVaughan, I saw some discussion in the R email lists that a new version of the "if.else" function like dplyr::if_else is a thing would be of value in base R.
@jrosell we've actually got vec_if_else() in https://github.com/r-lib/vctrs/pull/2030 as of last week
The "atomic" path might be interesting to base R. It is hyperoptimized and absurdly fast and memory efficient compared to base R's current approach.
The "generic" path is pretty vctrs specific and the base R fallback would be different.
data.table's implementation is faster because it uses multiple threads in some cases. But note the character vector benchmark where we are faster than them. That's a case where they can't use multiple threads, which I think suggests that our implementation is a bit faster in general on a single thread.
The link to the discussion https://stat.ethz.ch/pipermail/r-devel/2025-July/084096.html
Closed by https://github.com/r-lib/vctrs/pull/2024 and https://github.com/r-lib/vctrs/pull/2027