qdapRegex icon indicating copy to clipboard operation
qdapRegex copied to clipboard

Invalid patterns evaluated as valid by is.regex()

Open pieterjanvc opened this issue 5 years ago • 1 comments

Hi,

Let me start by saying I really like your package! Very useful :)

However, I found several patterns that throw an error in R (using stringr::str_detect()) although they are evaluated as valid RegEx by your is.regex() function. They are all related to the curly brace {. These patterns might be correct for other software, but in R they fail in all stringr functions and most of the base grep cases

Patterns that throw error:

  • "{"
  • "{}"
  • "{1,2}" (this one is 'valid' with the base grep function, but returns incorrect results)
  • "\d{1,2,3}"

Grtz, PJ

pieterjanvc avatar Jun 08 '20 22:06 pieterjanvc

Hi thanks for the report. This is because is.regex relies on base R for checking the regex (pretty low level checking):

is.regex <- function(pattern) {

    out <- suppressWarnings(try(gsub(pattern, "", "hello", perl=TRUE), silent = TRUE))
    ifelse(inherits(out, "try-error"), FALSE, TRUE)

}

Would you be willing to subit a pull request rewriting is.regex with stringi functions for the checking?

trinker avatar Apr 27 '22 12:04 trinker