stringr
stringr copied to clipboard
`str_detect_all` for multiple patterns?
In case you wants to filter all rows matching some string patterns, it would be handy to have str_detect_all
for this task,
I have used purrr
to implement this, but maybe there is already a better way available?
str_detect_all <- function(string, patterns, negate = FALSE){
n_patterns <- length(patterns)
if(n_patterns == 1 | n_patterns == length(string)){
return(str_detect(string = string, pattern = patterns, negate = negate))
}
map_any <- function(x, y) purrr::map2_lgl(.x = x, .y = y, .f = any)
patterns %>%
purrr::map(~ str_detect(string = string, pattern = .x, negate = negate)) %>%
purrr::reduce(.f = map_any)
}
In case you wanted to filter the mtcars
data for cars with c("Lotus", "Duster", "Drive")
in their name, this could be achieved by
> mtcars %>% as_tibble(rownames = "car") %>% filter(str_detect_all(car, c("Lotus", "Duster", "Drive")))
# A tibble: 3 × 12
car mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Hornet 4 Drive 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
2 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
3 Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
FWIW, you can already achieve that with str_detect()
and regular expressions:
mtcars |>
tibble::as_tibble(rownames = "car") |>
dplyr::filter(stringr::str_detect(car, "Lotus|Duster|Drive"))
#> # A tibble: 3 × 12
#> car mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Hornet 4 Dr… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 2 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 3 Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
Created on 2024-01-05 by the reprex package (v2.0.1)
Yeah, this is the approach that I'd recommend since the regexp engine can efficiently compile. If you're accepting user provided strings, make sure to call str_escape()
to escape any special characters that they might be using.