RVerbalExpressions
RVerbalExpressions copied to clipboard
Method dispatch for `rx_string`
Basically boils down to detecting that .data
is not of rx_string
class and acting as though first argument is the value
argument.
Reference: http://adv-r.had.co.nz/S3.html
UPDATED: sanitize now has method dispatch, so we can simply write
## unexported function for sanitizing arguments
sanitize_args <- function(...){
if (missing(...)) return(NULL)
res <- sapply(list(...), sanitize)
Reduce(paste0, res)
}
is.rx_string <- function(x){
inherits(x, "rx_string")
}
# class constructor - also unexported function.
rx <- function(x){
if(is.rx_string(x)) return(x)
class(x) <- c("rx_string", class(x))
x
}
rx_literal <- function(.data, ...) {
UseMethod("rx_literal", .data)
}
rx_literal.character <- function(.data, ...){
res <- paste0(sanitize(.data), sanitize_args(...))
rx(res)
}
rx_literal.rx_string <- function(.data, ...) {
res <- paste0(.data, sanitize_args(...))
rx(res)
}
Now you dont need a constructor. Function works both in chain and stand alone
rx_literal("?@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"
rx_literal("?") %>% rx_literal("@")
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"
Hadley says we should also implement a few essential methods. We should rethink all of our functions with vectorization in mind.
When implementing a vector class, you should implement these methods:
length, [, [<-, [[, [[<-, c
. (If[
is implementedrev, head, and tail
should all work).
In ‘Code smells’ Jenny Bryan argued that whenever you have if..inherits
, it means you need to write a method.
Perhaps we need sanitize.rx_string
and sanitize.character
This is exactly what the package needs! What is the use case for sanitize.rx_string
, wouldn't it already be sanitized?
Exactly the point. It does nothing. It is needed to prevent sanitization. For rx_string
objects sanitize.rx_string
class method is called before sanitize.character
. I will put together end to end example later today.
Here's S3 method for sanitize
sanitize <- function(.data = NULL){
UseMethod("sanitize", .data)
}
sanitize.rx_string <- function(.data = NULL) {
.data
}
# your function, unchanged, although we will need to rethink the error message now
sanitize.default <- function(.data = NULL) {
if(missing(.data))
stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()", .call = FALSE)
escape_chrs <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
string_chrs <- strsplit(.data, "")[[1]]
idx <- which(string_chrs %in% escape_chrs)
idx_new <- paste0("\\", string_chrs[idx])
paste0(replace(string_chrs, idx, idx_new), collapse = "")
}
rx_literal("?@") %>% sanitize()
#> [1] "\\?@"
#> attr(,"class")
#> [1] "rx_string" "character"
"?@" %>% sanitize()
#> [1] "\\?@"
I simplified the funcitons above assuming method dispatch for sanitize
Nice, I recently changed sanitize to:
sanitize <- function(.data = NULL) {
if(missing(.data))
stop("The 'value' argument is missing. Did you forget to start the rx chain with rx()?")
esc <- c(".", "|", "*", "?", "+", "(", ")", "{", "}", "^", "$", "\\", ":", "=", "[", "]")
gsub(paste0("([\\", paste0(collapse = "\\", esc), "])"), "\\\\\\1", .data, perl = TRUE)
}
The gsub
call comes from rex
. It's a bit more cryptic but also more concise. Do you prefer one over the other?
Regarding the error message, maybe something like "Nothing to sanitize, please provide a character vector or rx call."
I am fine with gsub
if you promise to maintain it. Six escape characters are horrifying, but you might be very well right. I am indifferent,
Haha, yes it does look scary. I mainly just assumed its bullet proof given who wrote the package. Let's stick with gsub
.
To do here:
- [x] Implement method dispatch for
sanitize
. Unexport it. - [ ] Make a template for method dispatch in multi-argument functions using
rx_literal
as example (also for two-argument functions, i.e. without...
).
Still relevant, because we really need to go away from the need to use rx()
constructor
I've been trying to figure out how to avoid rx()
and allow nested rx_
functions that have the ...
but I can't come up with anything. At this point, if users want to nest things, throw a dot in front.
rx() %>%
rx_either_of(
rx_find(., "foo"),
rx_find(., "bar")
)
One lazy solution I've came up with is:
`~` <- function(...) {
args <- as.list(sys.call())[-1]
args <- paste0("rx() %>% ", args)
args <- sapply(args, function(x) eval(parse(text = x)), USE.NAMES = FALSE)
paste0(args, collapse = "")
}
vals <- `~`
# but keep in mind
fortunes::fortune("answer is parse")
This allows something like:
# with tilde
rx() %>%
rx_group(
~ rx_either_of("cat", "dog")
) %>%
rx_literal(" food") %>%
stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA
# with vals
rx() %>%
rx_group(
vals(rx_either_of("cat", "dog"))
) %>%
rx_literal(" food") %>%
stringr::str_extract(c("cat food", "dog food", "fish food"), .)
#> [1] "cat food" "dog food" NA