pomsky
pomsky copied to clipboard
.NET: `\w` (and by extension `\b` and `\B`) don't conform to Unicode
\w is equivalent to [\p{L}\p{Mn}\p{Nd}\p{Pc}] in .NET instead of [\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_Control}]:
- It incorrectly uses
GC=Letterinstead ofAlphabetic=Yes; the latter includes more code points! - It doesn't match all of
GC=Mark, onlyGC=Nonspacing_Mark - It doesn't match
Join_Control=Yes
AFAIK there's nothing we can do other than emitting a warning: \p{Alpha} doesn't work in .NET, so we can't polyfill it. But a warning adds noise and doesn't help much when there isn't a straightforward fix.