regex
regex copied to clipboard
expand set of characters allowed in a capture group name
here is the simple example from doc.rs, just modify the group name to chinese
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3775118597eecd8c8e4a98475917b9d1
throw this error:
error: invalid capture group character
As the docs say, only [_0-9a-zA-Z]
characters are allowed as capture group names:
(?P<name>exp) named (also numbered) capture group (allowed chars: [_0-9a-zA-Z])
However, this is something that can, and probably should be expanded. But this isn't a bug and the error you're getting is correct.
sorry for that.
thanks ,if you can impl this
Why did you close this? I marked it as an enhancement and updated the title to reflect as such.
Before implementation, this requires a careful specification.
ok,thanks
is there any updates on this?
If there were, they would be recorded here.
cjk support mark , in case someone needs it
The set of characters will be expanded soon to be defined in terms of Unicode. Specifically, here is what the new rule will be:
Capture group names must be any sequence of alpha-numeric Unicode codepoints,
in addition to `.`, `_`, `[` and `]`. Names must start with either an `_` or
an alphabetic codepoint. Alphabetic codepoints correspond to the `Alphabetic`
Unicode property, while numeric codepoints correspond to the union of the
`Decimal_Number`, `Letter_Number` and `Other_Number` general categories.
@aohan237 See above. The new rule will include CJK support.
Thanks