2FAuth icon indicating copy to clipboard operation
2FAuth copied to clipboard

Sort with ignore case

Open Aldjinn opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. I would like to be able to sort from A-Z or Z-A but with ignore case. I have entries starting with UPPER and lower case, so (for me) it would be nice to ignore it during sort.

Describe the solution you'd like A checkbox to ignore the case during sort right next to A-Z and Z-A.

grafik

Aldjinn avatar Jun 17 '24 15:06 Aldjinn

Hi, I've recently realized that sorting is case sensitive, it's already on my to-do list 👍🏻

Bubka avatar Jun 18 '24 08:06 Bubka

The current implementation is not even "case-sensitive", it is just Unicode code point sorting. Both case-sensitive and -insensitive are language-dependent, so I recommend using a.localeCompare(b, <app's display language>).

const list = ["SASU", "säsä", "SISU", "sisi", "TATA", "ZÖZU", "zyzy"];

list.toSorted((a, b) => a > b ? 1 : -1); // current
// => ["SASU","SISU","TATA","ZÖZU","sisi","säsä","zyzy"]
list.toSorted((a, b) => a.toLowerCase() > b.toLowerCase() ? 1 : -1); // simple lowercase
// => ["SASU","sisi","SISU","säsä","TATA","zyzy","ZÖZU"]

const englishName = new Intl.DisplayNames(["en"], {type: "language"});
Object.fromEntries(["en", "fr", "sv", "et", "tr"].map((ln) => [englishName.of(ln), list.toSorted((a, b) => a.localeCompare(b, ln))]));
// {
//   "English":  ["säsä","SASU","sisi","SISU","TATA","ZÖZU","zyzy"],
//   "French":   ["säsä","SASU","sisi","SISU","TATA","ZÖZU","zyzy"],
//   "Swedish":  ["SASU","sisi","SISU","säsä","TATA","zyzy","ZÖZU"],
//   "Estonian": ["SASU","sisi","SISU","säsä","ZÖZU","zyzy","TATA"],
//   "Turkish":  ["säsä","SASU","SISU","sisi","TATA","ZÖZU","zyzy"]
// }

yheuhtozr avatar Jul 03 '24 04:07 yheuhtozr

Thx for the suggestion 👍🏻

Bubka avatar Jul 04 '24 12:07 Bubka

Hi, thank you for your work, but I am a little confused when I just checked the implementation. Do you want, for example, ["ac","Ab","AB","Ac","ab","AC"] to be sorted like ["Ab","AB","Ac","AC","ab","ac"] (when case-sensitive sort is on)?

yheuhtozr avatar Aug 13 '24 20:08 yheuhtozr

@yheuhtozr thx for your attention.

I just pushed another version of the sort function: The original implementation is back but with accented characters treated as non-accented characters. It's not locale specific sorting but it's better than having them listed after non-accented uppercase items.

The case-insensitive sorting remains untouched, it uses localeCompare so the sorting is ok. Any opinion on uppercase or lowercase first?

Bubka avatar Sep 11 '24 07:09 Bubka

@Bubka Sorry for being away, and being confused in conversation myself, but there look like a lot of concepts having been mixed up under the name of "case sensitivity", so I am not exactly sure what is the goal you actually intends for.

Let's call them by different names:

  • Case axis
    • case-merged: e.g. [A=a, B=b, ...] (so AsC > ASZ > AZa)
    • case-unmerged: e.g. [A, a, B, b, ...] (so ASZ > AsC > AZa)
    • case-separated: e.g. [A, B, ..., a, b, ...] (so ASZ > AZa > AsC)
    • case-unsorted: not having exact rules for casing
  • Variant axis
    • variant-merged: e.g. [a=á, e=é, ...]
    • variant-unmerged: e.g. [a, á, e, é, ...]
    • variant-separated: e.g. [a, e, ..., á, é, ...] (← but few people want this one, I guess)
    • variant-unsorted: not having exact rules for variants
  • Locale axis
    • locale-independent: order not changed by locale
    • locale-dependent: order can change by locale

So, we have tons of possible options, but first, we were not clear which "case-sensitive" means, case-unmerged or case-separated as I named.

  • The original implementation (simple code point sorting) was:

    • locale-independent, and
    • looks case-separated in the ASCII range only, otherwise mostly case-unsorted, and
    • variant-unsorted (because it never affects ASCII)
  • The "case-insensitive" branch in 5d3a1be and I suggested was:

    • locale-dependent, and
    • mostly case-merged: localeCompare actually does something smart that first tries case-merged sort, then does case-unmerged sort among those which returned no difference
    • mostly variant-merged: same as above

And I'm not sure what you tried to do with "case-sensitive" branches in 5d3a1be and d90ffd5.

  • Does 5d3a1be perhaps want to achieve something like locale-dependent, case-separated only for Latin letters or otherwise case-merged, and variant-merged version? But if so, not seem to working correctly.
  • Does d90ffd5 perhaps aim for a version locale-independent, case-separated for Latin letters, and variant-merged for Latin letters?
    • but it'd give wrong alphabetical orders for most non-Russian Cyrillic languages and variant-unsorted for most of non-European languages

Generally, I don't know much what aspect and what degree of "case-sensitive" behavior of old implementation you are thinking to retain.

Do you want to mix old behavior and locale support in any combination? Do you actually want to have more than two sorting options? Do you perhaps need some help in creating other "case-sensitive" version using localeCompare?

yheuhtozr avatar Oct 02 '24 10:10 yheuhtozr

Thx very much @yheuhtozr for the axis decomposition, it's very useful. And I must admit, I hadn't thought of all those combinations :exploding_head:

What I was aiming for with the case-sensitive branch is case-separated + variant-merged + local_dependant. I haven't found a way to achieve this with a single localeCompare call (is it even possible/relevant for each language?!) so I gave up on the local_dependant axis. For me, and as a french (latin) native speaker, the Case axis should also be case_separated. local_dependant should be applied too to stick as much as possible to each user's culture. I don't mind the Variant axis as long as it's not variant-separated. On top of that, I want to keep it simple, so no more sorting option, and it would be nice to have an algorithm that is not too complicated :innocent: 😄

Any help would be appreciated.

Bubka avatar Oct 25 '24 15:10 Bubka

What I was aiming for with the case-sensitive branch is case-separated + variant-merged + local_dependant. I haven't found a way to achieve this with a single localeCompare call (is it even possible/relevant for each language?!) so I gave up on the local_dependant axis.

Yes..., in a bigger picture, there is no idiomatic way to write case-separated sort in general (you can only loop them manually). Which means you handle some part of low-level Unicode works yourself when you want it. So a sample code that satisfies your description would be like:

let locale; // <- assume this is the current display language
let namesToSort; // <- assume this is an array that contains names

// sorter for reuse
// maybe you can put it inside the function if the sort only perform once every page
let caseSensitiveSorter = new Intl.Collator(locale, {
  sensitivity: "case", // <- focus on case comparison
  caseFirst: "upper", // <- make sure 'A' > 'a' (not 'a' > 'A')
});

// locale-dependent segmenter
// maybe you can put it inside the function
// Firefox only supports at >= 125 so you can fall back to simple [...string] if with any concern
let segmenter = new Intl.Segmenter(locale, {granularity: "grapheme"});

// sort function
// I don't know what is optimal but it returns a new sorted list based on input list
function caseSensitiveSort(list, sorter, segmenter) {
  let segList = list.map((e) => [...segmenter.segment(e)].map((s) => s.segment));
  segList.sort(function(a, b) {
    for (let i = 0; true; i++) {
      const result = sorter.compare(a[i] || '', b[i] || '');
      if (result !== 0 || !a[i] || !b[i]) { return result }
    }
  });
  return segList.map((e) => e.join(''));
}

let sortedNames = caseSensitiveSort(namesToSort, caseSensitiveSorter, segmenter); // execute sorting

yheuhtozr avatar Nov 10 '24 20:11 yheuhtozr