ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

What should happen with the usage: "sort" vs "search" choice in implementations?

Open littledan opened this issue 7 years ago • 13 comments
trafficstars

@gsathya pointed out that, in implementations, this choice is often not passed down to ICU (he was looking at V8, and I see something similar in SpiderMonkey). It's unclear to me what the ICU API is, but UTS #35 mentions separate data for searching. Does anyone have more background in the difference here, and whether the plumbing is in place to make this reality?

littledan avatar Aug 06 '18 19:08 littledan

ICU does offer ways to choose between the two iirc. The way might be (I have to check) to call two different apis (one for search and the other for collation) instead of two different option values passed to a single api, but don't quote me. I may as well misremember.

jungshik avatar Aug 07 '18 21:08 jungshik

SpiderMonkey and V8 both seem to respect the usage option.

js> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "sort"}).compare)
["\xC4", "AE"]
js> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "search"}).compare)
["AE", "\xC4"]

d8> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "sort"}).compare)
["Ä", "AE"]
d8> ["AE", "Ä"].sort(new Intl.Collator("de", {usage: "search"}).compare)
["AE", "Ä"]

anba avatar Aug 08 '18 13:08 anba

Looks like we need to pass the correct value to -u-co- extension in icu::Locale

gsathya avatar Aug 08 '18 13:08 gsathya

Hmm, looks like this will make icu add the u-co-search extension to the language tag, giving us

d8> new Intl.Collator("de", {usage: "search"}).resolvedOptions().locale
"de-u-co-search"

But according to the ECMA402 spec, "search" is not a valid value for the "co" extension. V8 will have to manually trim the u-co-search subtag from the locale. What was the reason to ban it in the ECMA402 spec?

gsathya avatar Aug 08 '18 14:08 gsathya

I guess "search" (and "standard") were banned from "co" to force users to use the "usage" option instead. (IIRC there are somewhere notes explaining the motivation for the "usage" option and why it only accepts "search" and "sort". But I don't have a link handy.)

anba avatar Aug 08 '18 15:08 anba

I guess "search" (and "standard") were banned from "co" to force users to use the "usage" option instead.

This seems pretty weak given how users can set numeric or caseFirst from both the extension and options. Can we change the spec?

gsathya avatar Aug 08 '18 15:08 gsathya

https://norbertlindenberg.com/2012/12/ecmascript-internationalization-api/index.html confirms that this is the reason why "co" doesn't allow "search" and "standard".

Link to meeting notes where "usage" was added: https://docs.google.com/document/d/1-NytPBbsO7dLvt0C2psJkF1Wtt3QdKF3NQAHksF5fyc/edit?hl=ko

Plus https://docs.google.com/document/d/1jCoJ7NU8JDTkBKPCDrSpdCVGd8DTl50iVwXvVhhC21s/edit?hl=sr

anba avatar Aug 08 '18 17:08 anba

The general idea was that options would represent the needs of the application using the API, while the locale reflects the preferences of the user. Of course sometimes there’s overlap, which is why some features can be controlled via both options and locale subtags.

In this case, whether you want sorting or searching behavior (to the extent that there’s a difference), is clearly determined by the app functionality in whose implementation it’s used. This choice is orthogonal to the language preferences of the user.

NorbertLindenberg avatar Aug 08 '18 22:08 NorbertLindenberg

@anba what build of V8 are you using that respects it?

PS F:\intl\test262> eshost -tse "['AE', 'Ä'].sort(new Intl.Collator('de', {usage: 'sort'}).compare)"
┌────────────────────────────────┬──────┐
│ ch (Chakra1 x64_debug)         │ Ä,AE │
│ ch (Chakra2 x64_debug)         │      │
│ Chakra (JSVU)                  │      │
│ SpiderMonkey                   │      │
│ V8                             │      │
├────────────────────────────────┼──────┤
│ jshost (unreleased/rs5 ICU 61) │ �,AE │
└────────────────────────────────┴──────┘
PS F:\intl\test262> eshost -tse "['AE', 'Ä'].sort(new Intl.Collator('de', {usage: 'search'}).compare)"
┌────────────────────────────────┬──────┐
│ SpiderMonkey                   │ AE,Ä │
├────────────────────────────────┼──────┤
│ ch (Chakra1 x64_debug)         │ Ä,AE │
│ ch (Chakra2 x64_debug)         │      │
│ Chakra (JSVU)                  │      │
│ V8                             │      │
├────────────────────────────────┼──────┤
│ jshost (unreleased/rs5 ICU 61) │ �,AE │
└────────────────────────────────┴──────┘

This is using V8 7.0 and SpiderMonkey 62 -- only SpiderMonkey seems to do anything different for me. I know I explicitly added a comment saying "I have no idea what the difference is between these two modes" and ignore the option value that is retrieved in InitializeCollator.

jackhorton avatar Aug 24 '18 20:08 jackhorton

@anba what build of V8 are you using that respects it?

V8 6.9 should have the correct behavior. I broke it a couple of weeks ago, and I have a fix in review now.

gsathya avatar Aug 24 '18 20:08 gsathya

Just put a PR out for Chakra to respect usage properly, thanks to the information in this thread!

https://github.com/Microsoft/ChakraCore/pull/5651

jackhorton avatar Aug 28 '18 23:08 jackhorton

Interesting that multiple engines ran into this issue. Now that we have the test262 locales tag, I think we could write shared tests for this case. Would anyone be interested in that? cc @FrankYFTang @Ms2ger

littledan avatar Oct 14 '18 09:10 littledan

@markusicu said on another issue that you can't implement searching with just the comparison operation. In the absence of a collator-based search API (which I'm skeptical of adding), what use case does the exposure of search collations for the comparison operation address?

Since the search collations take space, it seems bad for implementations to carry the search collations unless they address a useful use case in combination with the API surface that is available. Do they?

hsivonen avatar Nov 26 '21 07:11 hsivonen

2022-02-09 TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-02-09.md#what-should-happen-with-the-usage-sort-vs-search-choice-in-implementations-256

Action: Add a sentence or two to MDN and/or the spec better explaining this, then the issue can be closed.

sffc avatar Feb 10 '23 04:02 sffc