case-insensitive icon indicating copy to clipboard operation
case-insensitive copied to clipboard

Globbing

Open isomarcte opened this issue 1 year ago • 0 comments

https://github.com/typelevel/case-insensitive/blob/main/core/src/main/scala/org/typelevel/ci/package.scala#L34

The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.

In 1.x.x of case-insensitive we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.

The distinction here between "simple" and "full" is that a simple case fold will not change the number of char values needed to represent the string, but a full case fold may change the number of char values needed.

In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.

I will follow up with some more concrete examples shortly.

If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.

isomarcte avatar Jan 04 '23 14:01 isomarcte