regexr
regexr copied to clipboard
Add support for /u PCRE modifier
Handling of Unicode character classes (such as \p{Lu}) with the PCRE engine is a bit wonky, because PHP's preg_* functions parse input strings as ISO-8859-1 instead of UTF-8.
For instance : https://regexr.com/5sf68 The string "CHÊNE" should be matched entirely by the regex, but it actually creates two matches, split at the Ê.
In PHP, the /u modifier can be used to force the input to be treated as UTF-8. The following code returns the correct result:
preg_match_all('/\p{Lu}+/u', 'CHÊNE', $matches);
var_dump($matches);
Would it be possible to add support for this modifier in RegExr? I believe this would also fix #372.