regex icon indicating copy to clipboard operation
regex copied to clipboard

expand set of characters allowed in a capture group name

Open aohan237 opened this issue 5 years ago • 8 comments

here is the simple example from doc.rs, just modify the group name to chinese

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3775118597eecd8c8e4a98475917b9d1

throw this error:

error: invalid capture group character

aohan237 avatar Jul 11 '19 08:07 aohan237

As the docs say, only [_0-9a-zA-Z] characters are allowed as capture group names:

(?P<name>exp)  named (also numbered) capture group (allowed chars: [_0-9a-zA-Z])

However, this is something that can, and probably should be expanded. But this isn't a bug and the error you're getting is correct.

BurntSushi avatar Jul 11 '19 10:07 BurntSushi

sorry for that.

thanks ,if you can impl this

aohan237 avatar Jul 11 '19 10:07 aohan237

Why did you close this? I marked it as an enhancement and updated the title to reflect as such.

BurntSushi avatar Jul 11 '19 10:07 BurntSushi

Before implementation, this requires a careful specification.

BurntSushi avatar Jul 11 '19 10:07 BurntSushi

ok,thanks

aohan237 avatar Jul 11 '19 10:07 aohan237

is there any updates on this?

aohan237 avatar Mar 31 '20 02:03 aohan237

If there were, they would be recorded here.

BurntSushi avatar Mar 31 '20 02:03 BurntSushi

cjk support mark , in case someone needs it

aohan237 avatar Jul 04 '22 10:07 aohan237

The set of characters will be expanded soon to be defined in terms of Unicode. Specifically, here is what the new rule will be:

Capture group names must be any sequence of alpha-numeric Unicode codepoints,
in addition to `.`, `_`, `[` and `]`. Names must start with either an `_` or
an alphabetic codepoint. Alphabetic codepoints correspond to the `Alphabetic`
Unicode property, while numeric codepoints correspond to the union of the
`Decimal_Number`, `Letter_Number` and `Other_Number` general categories.

BurntSushi avatar Mar 03 '23 17:03 BurntSushi

@aohan237 See above. The new rule will include CJK support.

BurntSushi avatar Mar 03 '23 17:03 BurntSushi

Thanks

aohan237 avatar Mar 04 '23 00:03 aohan237