Implement case_when
Related: imported issue https://github.com/tidyverse/dplyr/issues/5871 by @jzadra:
Having used case_when() for years, I was quite worried today when I discovered that case_when() handles NA values in the test differently than if_else() and ifelse():

After discovering this behavior, I see that there is a note about this buried in the case_when() documentation examples section:
# Note that NA values in the vector x do not get special treatment. If you want
# to explicitly handle NA values you can use the `is.na` function:
x[2:4] <- NA_real_
case_when(
x %% 35 == 0 ~ "fizz buzz",
x %% 5 == 0 ~ "fizz",
x %% 7 == 0 ~ "buzz",
is.na(x) ~ "nope",
TRUE ~ as.character(x)
)
So now I know, but polling my team members at work reveals that none of them were aware of this behavior either.
My assumption has been that case_when()handles NA values in the test the same way if_else() does and return an NA UNLESS the NA values are specifically dealt with, rather than the other way around where they get the TRUE value if left unspecified. Consider that a basic test like NA == TRUE yields NA, and NA == NA also yields NA, further adding to my feeling that this is unexpected behavior: NA test values should yield an NA result for the TRUE ~ x line in case_when().
But regardless of whether this behavior is better for some reason (I'd love to understand if so!), it's obviously way too late to change it. I think it might be a good idea to include a warning/message when NAs exist in the tests but are treated as TRUE, ie "NA values in the data were handled by TRUE ~. Did you mean to specify a case for NA values?". There is precedent for this in other functions such as converting a character vector of mostly numeric values when there are some non-numeric values.
(moved here from https://github.com/tidyverse/dplyr/issues/5876)
case_when() and if_else() perform type-checks and it is for the best in most cases.
However, some data type are usually fully exchangeable, e.g. integers and doubles.
Therefore, you can experience some unwanted type-checking that makes little to no sense:
library(tidyverse)
df = read.table(header=TRUE, text="
id check count_x
1 8 0 0
2 8 1 1
3 8 1 2
8 8 0 3
9 8 0 3")
case_when(
df$count_x==0 ~ df$check,
df$count_x>0 ~ 1,
)
#> Error: must be an integer vector, not a double vector.
if_else(df$count_x==0, df$check, 1)
#> Error: `false` must be an integer vector, not a double vector.
Created on 2021-05-06 by the reprex package (v2.0.0)
Of course, casting the variable to plain numeric will work, using as.numeric() or simply +0, but it seems quite unnecessary:
case_when(
df$count_x==0 ~ df$check+0,
df$count_x>0 ~ 1,
)
#> [1] 0 1 1 1 1
Created on 2021-05-06 by the reprex package (v2.0.0)
Maybe the type-checking could be a bit more flexible in some cases, such as this one.
feature request: check the size of condition vectors and warn or error if they are not all equal, possibly with an exception for the last case so that a default can be specified by putting TRUE ~ 'default_val' at the end.
For reference, here's how data.table::fcase handles conditions with differing lengths:
data.table::fcase(
c(FALSE, FALSE), 'a',
TRUE, 'b',
c(FALSE, TRUE), 'c'
)
#> Error in data.table::fcase(c(FALSE, FALSE), "a", TRUE, "b", c(FALSE, TRUE), : Argument #3 has a different length than argument #1. Please make sure all logical conditions have the same length.
Created on 2021-10-12 by the reprex package (v2.0.1)
Currently dplyr::case_when silently recycles:
dplyr::case_when(
c(FALSE, FALSE) ~ 'a',
TRUE ~ 'b',
c(FALSE, TRUE) ~ 'c'
)
#> [1] "b" "b"
Created on 2021-10-12 by the reprex package (v2.0.1)
@eutwt I think case_when() would be consistent with the standard tidyverse recycling rules, which would not yield an error here.