RVerbalExpressions
RVerbalExpressions copied to clipboard
Syntax for rx_or()
Right now we have rx_or implementation which compares .data and value
##### Do not run
rx() %>%
rx_find("a") %>%
rx_or("b") # or at best rx_or(rx_find("b"))
In the comments you mentioned:
##### Do not run
# Not sure if I like this. I would prefer:
# find(value = "foo") %>%
# or() %>%
# find("bar")
# Rather than having to nest a rule inside of or(), maybe use glue?
Might the solution be similar to how now (in dev branch) we organized rx_one_of():
###### Do not run
rx() %>%
rx_find("gr") %>%
either_of(rx_find("a"), rx_find("e")) %>%
rx_find("y")
In a sense, this is rx_one_of with (?:a|b) instead of [ab] and limited to two arguments only. I actually believe nothing prevents us from allowing more arguments, if we go down this route. I think going this route will add consistency to the package.
Adding rx_either_of and stumbled upon the inherent eagerness of the | alternator:
rx_either_of <- function(.data = NULL, ..., rep = NULL, mode = "greedy") {
if (!inherits(.data, "rx_string")) stop("This function is not to be used as first element of the pipe! Please start pipe with constructor funcion rx()")
san_args <- sapply(list(...), sanitize)
san_args_peeled <- peel_set(san_args)
res <- paste0(.data, "(?:", paste0(san_args_peeled, collapse = "|"), ")", parse_rep_mode(rep, mode))
new_rx(res)
}
library(RVerbalExpressions)
# Alternation is eager!
rx() %>%
rx_either_of("GetValue", "Get", "Set", "SetValue") %>%
stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>%
.[[1]]
#> [1] "Get" "GetValue" "Set" "Set"
# Avoid eagerness with order of values
rx() %>%
rx_either_of("GetValue", "Get", "SetValue", "Set") %>%
stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>%
.[[1]]
#> [1] "Get" "GetValue" "Set" "SetValue"
# Avoid eagerness with word boundaries
rx() %>%
rx_word_edge() %>%
rx_either_of("GetValue", "Get", "Set", "SetValue") %>%
rx_word_edge() %>%
stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>%
.[[1]]
#> [1] "Get" "GetValue" "Set" "SetValue"
Should rx_either_of have an eager option which turns on word_boundaries? I'd prefer to not add more arguments but curious what you think. If we do decide to go with eager, should it be set to true? I think this is a rare case, so I'd prefer it to be false if we add the argument.