hxcpp icon indicating copy to clipboard operation
hxcpp copied to clipboard

\u in regular expressions

Open ncannasse opened this issue 6 years ago • 1 comments

PCRE uses \x whereas JS regexp uses \u for hexadecimal code sequence.

I think it would be better to support only \u for better compatibility. This could be done at compile-time but would only account for "constant" regexps. Instead you can #define PCRE_JAVASCRIPT_COMPAT when building PCRE as explained here: https://www.pcre.org/original/doc/html/pcrepattern.html#SEC5

ncannasse avatar Apr 14 '18 08:04 ncannasse

With PCRE2, PCRE_JAVASCRIPT_COMPAT was removed and replaced with PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS and PCRE2_MATCH_UNSET_BACKREF. See 2015-01-05: [pcre-dev] PCRE2 is released:

[...] The PCRE_JAVASCRIPT_COMPAT option has been split into independent functional options PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS, and PCRE2_MATCH_UNSET_BACKREF.

PCRE2_ALT_BSUX seems to be the key to adding \u support (and altering \x) more like JS but there is also PCRE2_EXTRA_ALT_BSUX (which implies PCRE2_ALT_BSUX) adding ECMAscript 6 style \u{hhh..} hexadecimal character codes.

Braced escape construct \N{U+hh..} is available and should work when Unicode/UTF mode is enabled (regardless of how \x and \u are treated), See pcre2syntax ESCAPED CHARACTERS and pcre2pattern BACKSLASH.

Uzume avatar Mar 25 '23 14:03 Uzume