RVerbalExpressions icon indicating copy to clipboard operation
RVerbalExpressions copied to clipboard

Syntax for rx_or()

Open dmi3kno opened this issue 6 years ago • 1 comments

Right now we have rx_or implementation which compares .data and value

##### Do not run
rx() %>% 
  rx_find("a") %>%
  rx_or("b") # or at best rx_or(rx_find("b"))

In the comments you mentioned:

##### Do not run
  # Not sure if I like this. I would prefer:
  # find(value = "foo") %>%
  #   or() %>%
  #   find("bar")
  # Rather than having to nest a rule inside of or(), maybe use glue?

Might the solution be similar to how now (in dev branch) we organized rx_one_of():

###### Do not run
rx() %>%
  rx_find("gr") %>%
  either_of(rx_find("a"), rx_find("e")) %>%
  rx_find("y")

In a sense, this is rx_one_of with (?:a|b) instead of [ab] and limited to two arguments only. I actually believe nothing prevents us from allowing more arguments, if we go down this route. I think going this route will add consistency to the package.

dmi3kno avatar Mar 14 '19 11:03 dmi3kno

Adding rx_either_of and stumbled upon the inherent eagerness of the | alternator:

rx_either_of <- function(.data = NULL, ..., rep = NULL, mode = "greedy") {
  if (!inherits(.data, "rx_string")) stop("This function is not to be used as first element of the pipe! Please start pipe with constructor funcion rx()")
  san_args <- sapply(list(...), sanitize)
  san_args_peeled <- peel_set(san_args)
  res <- paste0(.data, "(?:", paste0(san_args_peeled, collapse = "|"), ")", parse_rep_mode(rep, mode))
  new_rx(res)
}

library(RVerbalExpressions)

# Alternation is eager!
rx() %>% 
  rx_either_of("GetValue", "Get", "Set", "SetValue") %>% 
  stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>% 
  .[[1]]
#> [1] "Get"      "GetValue" "Set"      "Set"

# Avoid eagerness with order of values
rx() %>% 
  rx_either_of("GetValue", "Get", "SetValue", "Set") %>% 
  stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>% 
  .[[1]]
#> [1] "Get"      "GetValue" "Set"      "SetValue"

# Avoid eagerness with word boundaries
rx() %>% 
  rx_word_edge() %>% 
  rx_either_of("GetValue", "Get", "Set", "SetValue") %>% 
  rx_word_edge() %>% 
  stringr::str_extract_all("Get, GetValue, Set or SetValue", .) %>% 
  .[[1]]
#> [1] "Get"      "GetValue" "Set"      "SetValue"

Should rx_either_of have an eager option which turns on word_boundaries? I'd prefer to not add more arguments but curious what you think. If we do decide to go with eager, should it be set to true? I think this is a rare case, so I'd prefer it to be false if we add the argument.

tylerlittlefield avatar Mar 16 '19 17:03 tylerlittlefield