dplyr
dplyr copied to clipboard
Draft `case_match()` and `vec_case_match()`
case_match() is a variant of case_when() that takes a primary input, .x, and then a series of formulas where the LHSs of each formula are values to match against .x rather than logical vectors. The LHSs get turned into logical conditions by vec_in(), and then the results are passed on to vec_case_when().
It technically closes https://github.com/tidyverse/funs/issues/60
This would function as a direct successor to recode(), which is already questioning and has an awkward interface for anything except character vectors (and even there it can be odd).
char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)
recode(char_vec, a = "Apple", b = "Banana")
#> [1] "Banana" "Banana" "c" "Banana" "c" "Banana"
#> [7] "Apple" "Banana" "c" "Banana"
case_match(
char_vec,
"a" ~ "Apple",
"b" ~ "Banana",
.default = char_vec
)
#> [1] "Banana" "Banana" "c" "Banana" "c" "Banana"
#> [7] "Apple" "Banana" "c" "Banana"
recode(char_vec, a = "Apple", b = "Banana", .default = NA_character_)
#> [1] "Banana" "Banana" NA "Banana" NA "Banana"
#> [7] "Apple" "Banana" NA "Banana"
case_match(
char_vec,
"a" ~ "Apple",
"b" ~ "Banana"
)
#> [1] "Banana" "Banana" NA "Banana" NA "Banana"
#> [7] "Apple" "Banana" NA "Banana"
# `case_match()` is more general and works elegantly
# with more than just character
num_vec <- c(1:4, NA)
recode(num_vec, `1` = "o", `2` = "e", `3` = "o", `4` = "e", .missing = "m")
#> [1] "o" "e" "o" "e" "m"
case_match(
num_vec,
c(1, 3) ~ "o",
c(2, 4) ~ "e",
NA ~ "m"
)
#> [1] "o" "e" "o" "e" "m"
# More of a programmatic usage
level_key <- c(a = "apple", b = "banana", c = "carrot")
recode(char_vec, !!!level_key)
#> [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#> [7] "apple" "banana" "carrot" "banana"
vec_case_match(
needles = char_vec,
haystacks = as.list(names(level_key)),
values = as.list(level_key),
default = char_vec
)
#> [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#> [7] "apple" "banana" "carrot" "banana"
I still think a replace_match() would be useful here, like:
# type stable replacement wrapper around case_match()
replace_match <- function(.x, ...) {
ptype <- vec_ptype(.x)
ptype <- vec_ptype_finalise(ptype)
case_match(.x = .x, ..., .default = .x, .ptype = ptype)
}
# very close to compactness of recode()
replace_match(
char_vec,
"a" ~ "Apple",
"b" ~ "Banana"
)
# instead of
case_match(
char_vec,
"a" ~ "Apple",
"b" ~ "Banana",
.default = char_vec
)
replace_match() could also be used instead of a match-like version of na_if()
x <- c("a", "NA", "NaN", "no")
replace_match(x, c("NA", "NaN", "no") ~ NA)
In forcats, we could have fct_case_match() as a successor to recode_factor(), but its interface would probably be the other way around, like:
fct_case_match(
.x,
odd = c(1, 3),
even = c(2, 4),
ordered = FALSE
)
fct_case_when(
odd = .x %in% c(1, 3),
even = .x %in% c(2, 4),
ordered = FALSE
)
A better name for this might be case_switch(). i.e. it is a vectorized switch statement.
It just has the nice property of being able to collapse cases with the same right-hand sides into one line
case_switch(
num_vec,
1 ~ "o",
3 ~ "o",
2 ~ "e",
4 ~ "e",
NA ~ "m"
)
case_switch(
num_vec,
c(1, 3) ~ "o",
c(2, 4) ~ "e",
NA ~ "m"
)
The whole point of case_switch() is to mimic the SQL "simple" CASE statement. Our case_when() handles the "searched" CASE statement. data.table has also been considering something like this https://github.com/Rdatatable/data.table/issues/4820
Do we want to superseded recode() in this PR or a separate one?
I'll leave that for another PR, I want to get this one in