RVerbalExpressions icon indicating copy to clipboard operation
RVerbalExpressions copied to clipboard

Method dispatch for `rx_string`

Open dmi3kno opened this issue 5 years ago • 11 comments

Basically boils down to detecting that .data is not of rx_string class and acting as though first argument is the value argument. Reference: http://adv-r.had.co.nz/S3.html

UPDATED: sanitize now has method dispatch, so we can simply write

## unexported function for sanitizing arguments
sanitize_args <- function(...){
   if (missing(...)) return(NULL) 
  res <- sapply(list(...), sanitize) 
  Reduce(paste0, res)
}

is.rx_string <- function(x){
  inherits(x, "rx_string")
}

# class constructor - also unexported function. 
rx <- function(x){
  if(is.rx_string(x)) return(x)
  class(x) <- c("rx_string", class(x)) 
  x
}

rx_literal <- function(.data, ...) {
  UseMethod("rx_literal", .data)
}

rx_literal.character <- function(.data, ...){
  res <- paste0(sanitize(.data), sanitize_args(...))
  rx(res)
}

rx_literal.rx_string <- function(.data, ...) {
  res <- paste0(.data, sanitize_args(...))
  rx(res)
}

Now you dont need a constructor. Function works both in chain and stand alone

rx_literal("?@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

rx_literal("?") %>% rx_literal("@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

Hadley says we should also implement a few essential methods. We should rethink all of our functions with vectorization in mind.

When implementing a vector class, you should implement these methods: length, [, [<-, [[, [[<-, c. (If [ is implemented rev, head, and tail should all work).

dmi3kno avatar Mar 10 '19 10:03 dmi3kno

In ‘Code smells’ Jenny Bryan argued that whenever you have if..inherits, it means you need to write a method.

Perhaps we need sanitize.rx_string and sanitize.character

dmi3kno avatar Mar 10 '19 13:03 dmi3kno

This is exactly what the package needs! What is the use case for sanitize.rx_string, wouldn't it already be sanitized?

tylerlittlefield avatar Mar 10 '19 17:03 tylerlittlefield

Exactly the point. It does nothing. It is needed to prevent sanitization. For rx_string objects sanitize.rx_string class method is called before sanitize.character. I will put together end to end example later today.

dmi3kno avatar Mar 10 '19 18:03 dmi3kno

Here's S3 method for sanitize

sanitize <- function(.data = NULL){
  UseMethod("sanitize", .data) 
}

sanitize.rx_string <- function(.data = NULL) {
  .data
}
# your function, unchanged, although we will need to rethink the error message now
sanitize.default <- function(.data = NULL) {
  if(missing(.data))
    stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()", .call = FALSE)
  escape_chrs <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
  string_chrs <- strsplit(.data, "")[[1]]
  idx <- which(string_chrs %in% escape_chrs)
  idx_new <- paste0("\\", string_chrs[idx])
  paste0(replace(string_chrs, idx, idx_new), collapse = "")
}

rx_literal("?@") %>% sanitize()
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"

"?@" %>% sanitize()
#> [1] "\\?@"

I simplified the funcitons above assuming method dispatch for sanitize

dmi3kno avatar Mar 10 '19 18:03 dmi3kno

Nice, I recently changed sanitize to:

sanitize <- function(.data = NULL) {
  if(missing(.data))
    stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()?")
  esc <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
  gsub(paste0("([\\", paste0(collapse = "\\", esc), "])"), "\\\\\\1", .data, perl = TRUE)
}

The gsub call comes from rex. It's a bit more cryptic but also more concise. Do you prefer one over the other?

Regarding the error message, maybe something like "Nothing to sanitize, please provide a character vector or rx call."

tylerlittlefield avatar Mar 10 '19 18:03 tylerlittlefield

I am fine with gsub if you promise to maintain it. Six escape characters are horrifying, but you might be very well right. I am indifferent,

dmi3kno avatar Mar 10 '19 18:03 dmi3kno

Haha, yes it does look scary. I mainly just assumed its bullet proof given who wrote the package. Let's stick with gsub.

tylerlittlefield avatar Mar 10 '19 18:03 tylerlittlefield

To do here:

  • [x] Implement method dispatch for sanitize. Unexport it.
  • [ ] Make a template for method dispatch in multi-argument functions using rx_literal as example (also for two-argument functions, i.e. without ...).

dmi3kno avatar Mar 10 '19 21:03 dmi3kno

Still relevant, because we really need to go away from the need to use rx() constructor

dmi3kno avatar Mar 15 '19 07:03 dmi3kno

I've been trying to figure out how to avoid rx() and allow nested rx_ functions that have the ... but I can't come up with anything. At this point, if users want to nest things, throw a dot in front.

rx() %>% 
  rx_either_of(
    rx_find(., "foo"),
    rx_find(., "bar")
  )

tylerlittlefield avatar Mar 15 '19 15:03 tylerlittlefield

One lazy solution I've came up with is:

`~` <- function(...) {
  args <- as.list(sys.call())[-1]
  args <- paste0("rx() %>% ", args)
  args <- sapply(args, function(x) eval(parse(text = x)), USE.NAMES = FALSE)
  paste0(args, collapse = "")
}

vals <- `~`

# but keep in mind
fortunes::fortune("answer is parse")

This allows something like:

# with tilde
rx() %>%
  rx_group(
    ~ rx_either_of("cat", "dog")
  ) %>%
  rx_literal(" food") %>%
  stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA

# with vals
rx() %>%
  rx_group(
    vals(rx_either_of("cat", "dog"))
  ) %>%
  rx_literal(" food") %>%
  stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA

tylerlittlefield avatar Mar 21 '19 17:03 tylerlittlefield