glob-match
glob-match copied to clipboard
Unicode characters vs codepoints
Hi! It would be nice if this library specifies how it handles multi-codepoint-characters or graphemes (🎉 ). I was comparing this against the doublestar go library (https://github.com/bmatcuk/doublestar) which seems to handle unicode whereas this evaluates globs at the codepoint level and so certain things don't line up.
Example: a[^b]c
matches acc
, but not a🔥c
. Of course emoji is a simple example but there are large volumes of 'regular' unicode such as other-language characters that could end up in paths. I am willing to contribute (and have started) a feature-flag toggle that allows for this, since it will presumably be more performance intensive than simply going char-for-char when looking for grapheme boundaries.
I would not expect this to work with ranges (to me should be undefined), though we could have lowu32 <= var <=highu32
Thanks for the lib!
Alex