hxcpp
hxcpp copied to clipboard
\u in regular expressions
PCRE uses \x whereas JS regexp uses \u for hexadecimal code sequence.
I think it would be better to support only \u for better compatibility. This could be done at compile-time but would only account for "constant" regexps. Instead you can #define PCRE_JAVASCRIPT_COMPAT when building PCRE as explained here: https://www.pcre.org/original/doc/html/pcrepattern.html#SEC5
With PCRE2, PCRE_JAVASCRIPT_COMPAT
was removed and replaced with PCRE2_ALT_BSUX
, PCRE2_ALLOW_EMPTY_CLASS
and PCRE2_MATCH_UNSET_BACKREF
. See 2015-01-05: [pcre-dev] PCRE2 is released:
[...] The PCRE_JAVASCRIPT_COMPAT option has been split into independent functional options PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS, and PCRE2_MATCH_UNSET_BACKREF.
PCRE2_ALT_BSUX
seems to be the key to adding \u
support (and altering \x
) more like JS but there is also PCRE2_EXTRA_ALT_BSUX
(which implies PCRE2_ALT_BSUX
) adding ECMAscript 6 style \u{hhh..}
hexadecimal character codes.
Braced escape construct \N{U+hh..}
is available and should work when Unicode/UTF mode is enabled (regardless of how \x
and \u
are treated), See pcre2syntax ESCAPED CHARACTERS and pcre2pattern BACKSLASH.