Regex101 icon indicating copy to clipboard operation
Regex101 copied to clipboard

PCRE group names can either be ASCII or Unicode, depending on the settings.

Open danon opened this issue 3 years ago • 5 comments

Bug Description

Currently, regex101 marks perfectly valid groups as invalid.

Reproduction steps

Try this pattern:

/pattern(?<gróup>Foo)/

Expected Outcome

It should be marked as valid, if:

  • Either modifier u is used
  • Or verb (*UTF) is at the begining of the pattern (i see you don't support it yet)

Browser

Include browser name and version


For ascii used names, this pattern should validate the group [_A-Za-z][_A-Za-z0-9]*. For unicode names, this pattern should validate the group [_\p{L}][_\p{L}\p{Nd}]*

But definitely if u modifier is used, then pattern(?<gróup>Foo) this pattern is valid.

danon avatar Sep 21 '22 17:09 danon

You're right, and this is a limitation in many regards, since browsers don't have great support for matching unicode strings yet.

As for supporting (*UTF), you need to use (*UTF16) on regex101.

firasdib avatar Nov 16 '22 15:11 firasdib

You're right, and this is a limitation in many regards, since browsers don't have great support for matching unicode strings yet.

As for supporting (*UTF), you need to use (*UTF16) on regex101.

In PHP string (*UTF16) is regardes as invalid verb.

Can you make a mapping, so that in PHP (PCRE and PCRE2) it's (*UTF)?

Your website regex101 is great, because I can always just copy-paste the regexp and it parses it perfectly. If I have to change (*UTF) to (*UTF16) we loose that convenience.

danon avatar Nov 16 '22 16:11 danon

Yes, I can do that.

firasdib avatar Nov 16 '22 17:11 firasdib

Thanks, that would be great.

danon avatar Nov 16 '22 17:11 danon

@Danon Sorry for the delay, support for (*UTF) will be live within the hour.

firasdib avatar Dec 10 '22 15:12 firasdib