ecma402
ecma402 copied to clipboard
What should happen with the usage: "sort" vs "search" choice in implementations?
@gsathya pointed out that, in implementations, this choice is often not passed down to ICU (he was looking at V8, and I see something similar in SpiderMonkey). It's unclear to me what the ICU API is, but UTS #35 mentions separate data for searching. Does anyone have more background in the difference here, and whether the plumbing is in place to make this reality?
ICU does offer ways to choose between the two iirc. The way might be (I have to check) to call two different apis (one for search and the other for collation) instead of two different option values passed to a single api, but don't quote me. I may as well misremember.
SpiderMonkey and V8 both seem to respect the usage option.
js> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "sort"}).compare)
["\xC4", "AE"]
js> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "search"}).compare)
["AE", "\xC4"]
d8> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "sort"}).compare)
["Ä", "AE"]
d8> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "search"}).compare)
["AE", "Ä"]
Looks like we need to pass the correct value to -u-co- extension in icu::Locale
Hmm, looks like this will make icu add the u-co-search extension to the language tag, giving us
d8> new Intl.Collator("de", {usage: "search"}).resolvedOptions().locale
"de-u-co-search"
But according to the ECMA402 spec, "search" is not a valid value for the "co" extension. V8 will have to manually trim the u-co-search subtag from the locale. What was the reason to ban it in the ECMA402 spec?
I guess "search" (and "standard") were banned from "co" to force users to use the "usage" option instead. (IIRC there are somewhere notes explaining the motivation for the "usage" option and why it only accepts "search" and "sort". But I don't have a link handy.)
I guess "search" (and "standard") were banned from "co" to force users to use the "usage" option instead.
This seems pretty weak given how users can set numeric or caseFirst from both the extension and options. Can we change the spec?
https://norbertlindenberg.com/2012/12/ecmascript-internationalization-api/index.html confirms that this is the reason why "co" doesn't allow "search" and "standard".
Link to meeting notes where "usage" was added: https://docs.google.com/document/d/1-NytPBbsO7dLvt0C2psJkF1Wtt3QdKF3NQAHksF5fyc/edit?hl=ko
Plus https://docs.google.com/document/d/1jCoJ7NU8JDTkBKPCDrSpdCVGd8DTl50iVwXvVhhC21s/edit?hl=sr
The general idea was that options would represent the needs of the application using the API, while the locale reflects the preferences of the user. Of course sometimes there’s overlap, which is why some features can be controlled via both options and locale subtags.
In this case, whether you want sorting or searching behavior (to the extent that there’s a difference), is clearly determined by the app functionality in whose implementation it’s used. This choice is orthogonal to the language preferences of the user.
@anba what build of V8 are you using that respects it?
PS F:\intl\test262> eshost -tse "['AE', 'Ä'].sort(new Intl.Collator('de', {usage: 'sort'}).compare)"
┌────────────────────────────────┬──────┐
│ ch (Chakra1 x64_debug) │ Ä,AE │
│ ch (Chakra2 x64_debug) │ │
│ Chakra (JSVU) │ │
│ SpiderMonkey │ │
│ V8 │ │
├────────────────────────────────┼──────┤
│ jshost (unreleased/rs5 ICU 61) │ �,AE │
└────────────────────────────────┴──────┘
PS F:\intl\test262> eshost -tse "['AE', 'Ä'].sort(new Intl.Collator('de', {usage: 'search'}).compare)"
┌────────────────────────────────┬──────┐
│ SpiderMonkey │ AE,Ä │
├────────────────────────────────┼──────┤
│ ch (Chakra1 x64_debug) │ Ä,AE │
│ ch (Chakra2 x64_debug) │ │
│ Chakra (JSVU) │ │
│ V8 │ │
├────────────────────────────────┼──────┤
│ jshost (unreleased/rs5 ICU 61) │ �,AE │
└────────────────────────────────┴──────┘
This is using V8 7.0 and SpiderMonkey 62 -- only SpiderMonkey seems to do anything different for me. I know I explicitly added a comment saying "I have no idea what the difference is between these two modes" and ignore the option value that is retrieved in InitializeCollator.
@anba what build of V8 are you using that respects it?
V8 6.9 should have the correct behavior. I broke it a couple of weeks ago, and I have a fix in review now.
Just put a PR out for Chakra to respect usage properly, thanks to the information in this thread!
https://github.com/Microsoft/ChakraCore/pull/5651
Interesting that multiple engines ran into this issue. Now that we have the test262 locales tag, I think we could write shared tests for this case. Would anyone be interested in that? cc @FrankYFTang @Ms2ger
@markusicu said on another issue that you can't implement searching with just the comparison operation. In the absence of a collator-based search API (which I'm skeptical of adding), what use case does the exposure of search collations for the comparison operation address?
Since the search collations take space, it seems bad for implementations to carry the search collations unless they address a useful use case in combination with the API surface that is available. Do they?
2022-02-09 TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-02-09.md#what-should-happen-with-the-usage-sort-vs-search-choice-in-implementations-256
Action: Add a sentence or two to MDN and/or the spec better explaining this, then the issue can be closed.