v2
v2 copied to clipboard
Escaped question marks in the replace function match between characters
For context, I was attempting to match Youtube URLs with the query string e.g. watch?v=...
which of course required me to escape the question mark.
I'm not very familiar with re2, but from what I can tell searching for an escaped question mark (\?
) does not function as intended. At least, this is the case for the replace
rewrite rule.
Effect: Using the rewrite rule replace("\?"|"A")
will add the letter A
between every character.
Expected: Using the rewrite rule replace("\?"|"A")
will replace all question marks with the letter A
.
Additionally, including \?
seems to break the whole search string, causing the same issue as above. For example, if I used replace("watch\?v=(.*?)\""|"A")
, it will still add an A
between every character.
I've tried using my regex on Regex Planet to be sure I was getting the right results otherwise and that was successful.
For completeness sake, I tried replace("?"|"A")
and it had no effect on the content.
I know there are more contexts to use regex, but it's already taken me hours to identify and replicate this as is, but I imagine it applies to those contexts as well.
I've done less testing on it, but it appears that escaping a period (\.
) has similar behavior.
I've done some more testing after considering things and realized that escaping these regex-specific characters works when double-escaping (e.g. \\.
, \\?
) the characters.
It may seem obvious in hindsight, but the reason I never considered it was because I needed to escape quotation marks (\"
) to include them in the regex search, which is done with a single backslash, and can not be done with a double backslash.
This revelation changes the context of the issue in a way that I might need to reconsider it as a whole. Does something need to be done about this, whether it be in changing something or just documenting the somewhat unintuitive distinction?