glob-match Unicode characters vs codepoints

Unicode characters vs codepoints

Open arlyon opened this issue 1 year ago • 1 comments

Hi! It would be nice if this library specifies how it handles multi-codepoint-characters or graphemes (🎉 ). I was comparing this against the doublestar go library (https://github.com/bmatcuk/doublestar) which seems to handle unicode whereas this evaluates globs at the codepoint level and so certain things don't line up.

Example: a[^b]c matches acc, but not a🔥c. Of course emoji is a simple example but there are large volumes of 'regular' unicode such as other-language characters that could end up in paths. I am willing to contribute (and have started) a feature-flag toggle that allows for this, since it will presumably be more performance intensive than simply going char-for-char when looking for grapheme boundaries.

I would not expect this to work with ranges (to me should be undefined), though we could have lowu32 <= var <=highu32

Thanks for the lib!

Alex

Apr 06 '23 14:04 arlyon

glob-match glob-match copied to clipboard

Unicode characters vs codepoints

glob-match
glob-match copied to clipboard