pomsky icon indicating copy to clipboard operation
pomsky copied to clipboard

.NET: `\w` (and by extension `\b` and `\B`) don't conform to Unicode

Open Aloso opened this issue 2 years ago • 0 comments

\w is equivalent to [\p{L}\p{Mn}\p{Nd}\p{Pc}] in .NET instead of [\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_Control}]:

  1. It incorrectly uses GC=Letter instead of Alphabetic=Yes; the latter includes more code points!
  2. It doesn't match all of GC=Mark, only GC=Nonspacing_Mark
  3. It doesn't match Join_Control=Yes

AFAIK there's nothing we can do other than emitting a warning: \p{Alpha} doesn't work in .NET, so we can't polyfill it. But a warning adds noise and doesn't help much when there isn't a straightforward fix.

Aloso avatar Mar 28 '23 13:03 Aloso