regexr icon indicating copy to clipboard operation
regexr copied to clipboard

Add support for /u PCRE modifier

Open mleyen opened this issue 4 years ago • 0 comments

Handling of Unicode character classes (such as \p{Lu}) with the PCRE engine is a bit wonky, because PHP's preg_* functions parse input strings as ISO-8859-1 instead of UTF-8.

For instance : https://regexr.com/5sf68 The string "CHÊNE" should be matched entirely by the regex, but it actually creates two matches, split at the Ê.

In PHP, the /u modifier can be used to force the input to be treated as UTF-8. The following code returns the correct result:

preg_match_all('/\p{Lu}+/u', 'CHÊNE', $matches);
var_dump($matches);

Would it be possible to add support for this modifier in RegExr? I believe this would also fix #372.

mleyen avatar May 07 '21 14:05 mleyen