Support collator strength and decomposition mode for row ordering Sort
As alluded to in https://github.com/OpenRefine/OpenRefine/issues/6608#issuecomment-2116153415, it would be useful to extend our sorting capabilities to support different strength collators as well as different locales.
We currently support IDENTICAL and SECONDARY strengths, depending on whether the user selects caseSensitive or not
https://github.com/OpenRefine/OpenRefine/blob/6f937d0d8f619f59a12abdd7e5a5b84e4b71128c/modules/core/src/main/java/com/google/refine/sorting/StringCriterion.java#L63
but hard wire the decomposition mode to FULL_DECOMPOSITION and always use the server's default locale (which is likely to match that of the browser in most common, and all supported, cases)
https://github.com/OpenRefine/OpenRefine/blob/6f937d0d8f619f59a12abdd7e5a5b84e4b71128c/modules/core/src/main/java/com/google/refine/sorting/StringCriterion.java#L58
Proposed solution
A complete solution would include giving the user control of:
- strength - identical, primary, secondary, or tertiary
- decomposition - none, canonical, full
- locale - default or selectable from list of any supported
as described in the Java docs. Currently supported options are rendered in bold above.
Alternatives considered
We could support a subset of the above -- or even leave things the way they are.
This only affects the row ordering as done by the Sort function in the UI. It may also be desirable to give the user more fine grained control over the grel sort() function for arrays (although it doesn't even have the basics such as case insensitive sort).
Additional context
Diacritic insensitive collation using the server's default locale was first introduced in #202 in 2012. There was a regression for 3.x, which was fixed when #6047 was resolved for 3.8. The fact that it took several years for the regression to be noticed probably means that most people use, and are happy with, the default case-insensitive, diacritic-sensitive, but natural order, sort using the default locale, perhaps making this low priority.