case-insensitive Unicode Case Folding

Unicode Case Folding

Open isomarcte opened this issue 2 years ago • 12 comments

While working on cats-uri, I ran into an issue with how CIString was handling certain unicode values which led me to notice it wasn't respecting Caseless matching from the Unicode standard. As it turns out, neither does String.equalsIgnoreCase.

I'd just about completed a branch to implement full case folding as defined by the Unicode standard when I ran across this test.

  test("character based equality") {
    assert(CIString("ß") != CIString("SS"))
  }

Since under the Unicode standard's caseless matching these two strings would compare equal, I'm beginning to think we are intentionally not following the standard here. Is that the case? If so, why? Is it to maintain parity with what the Java standard library is doing with methods like equalsIgnoresCase?

Feb 05 '22 20:02 isomarcte

case-insensitive case-insensitive copied to clipboard

Unicode Case Folding

case-insensitive
case-insensitive copied to clipboard