CLDR-17382 languagematch Ukrainian should not fall back to Russian
CLDR-17382
- [X] This PR completes the ticket.
ALLOW_MANY_COMMITS=true
I didn't think we were changing the match to English, I think it should match what was done for Macedonian. I thought the goal was to remove the language match and not to explicitly cause it to match to English.
I didn't think we were changing the match to English, I think it should match what was done for Macedonian. I thought the goal was to remove the language match and not to explicitly cause it to match to English.
Before making the change, I looked at some of the other language matches and previous changes, and it seemed that a change, not deletion, was the right thing to do in this case. For Estonian they commented it out because (according to ticket/PR comments) they couldn't seem to decide between Russian and English and decided not to have a fallback at all. Many other languages have explicit fallback to English. I didn't want to leave the fallback to be random or unpredictable so I went with explicit.
I think we can wait til tomorrow to see what Markus says, rather than Resolve the Conversation now.
The information we got is "Ukrainian-language users don't want to be matched with Russian-language contents". The corresponding data change is to remove (comment out) the languageMatch for this pair.
This is language matcher data which feeds into an implementation (e.g., ICU LocaleMatcher) which a caller sets up with a list of supported languages plus an optional default language. When there is no match, then the matcher returns the default language, when it's set, otherwise with a "no match" result.
The default language is chosen by the caller. It need not be English. And not setting one at all is a valid, important choice. Some callers have special strategies for what to do next.
When we overcorrect and force a "fallback" to English, then we short-circuit the functioning of the algorithm and defeat the caller's intent.
We should handle this like the other geopolitical cases in the past, like Macedonian.
I haven't reviewed other "fallbacks to English" in detail. I assumed that they were generally matches based on some information, like populations are actually somewhat likely to understand English because it's one of the local government and entertainment languages, or remaining influence in colonies, etc. I would expect similar data for one-way matches to French (e.g., Breton --> French), Portuguese, Chinese, and Arabic.
Some of these might not make sense. At a glance, I see a one-way match from Esperanto to English; that looks bogus.
A review of existing data deserves a separate ticket.
Upshot
I don't think it matters much whether we include an explicit fallback or not, and we don't for many languages. So to be conservative, we could omit the fallback mapping to English, then revisit this in the next cycle.
Background
I looked at this a bit. If someone uses the default settings with the proposed change, here is what happens.
- the default direction includes ONE_WAY
- the default max threshhold distance is 50
- the default locale is the first supported language (the caller really has to supply the supported locales)
If the user's desired languages are <Ukrainian, French> and the default language is set to German, then the priority order among the app's supported languages would be:
<Ukrainian, French, English, German>
So on systems that allow for secondary desired languages, such as iOS, Android, and MacOS, it is easy for users get the the desired result, if their favored fallback language is French (or Russian) rather than German. (Of course, that doesn't necessarily mean that users take advantage of this ability.)
The difference would be that English would come before what the system has set as a default.
Now, if the system doesn't set a specific default, or doesn't order the supported languages to put a reasonable default as the first one, the results would be a bit random. On the other hand, that is the case for many of our current languages, since we don't always have a fallback for all major languages. I suspect that a very large number of users of LocaleMatcher will use a system default of English; although that might be different in Ukraine.
I think that's why we made an effort to have fallback locales for most locales that are not "top tier" (ie, those supported by most applications), so that the user would get some reasonable result for systems that don't set the default based on the likelyhood that users in that country would understand the language.
@macchiati , I'm not sure if I can discern a tie-breaking call in your reply. Should we default to English or leave it commented out like Markus suggests?
I don't think it matters much whether we include an explicit fallback or not, and we don't for many languages. So to be conservative, we could omit the fallback mapping to English, then revisit this in the next cycle.
Yes please let's just comment out the offensive mapping as requested in the ticket.
I think that's why we made an effort to have fallback locales for most locales that are not "top tier" (ie, those supported by most applications), so that the user would get some reasonable result for systems that don't set the default based on the likelyhood that users in that country would understand the language.
That last part is important. If it's reasonable to assume that people who understand language x might also understand English or French or... then we should have a medium-high-distance one-way match for that. If not, then we shouldn't have a languageMatch entry. I would argue that Esperanto-->English is the latter. (And I am not asking for changing that in this PR nor under this ticket!)
@macchiati , I'm not sure if I can discern a tie-breaking call in your reply. Should we default to English or leave it commented out like Markus suggests?
Let's comment it out for now, and revisit next cycle.
If it's reasonable to assume that people who understand language x might also understand English or French or... then we should have a medium-high-distance one-way match for that.
I don't see it that way. The goal for the fallbacks should be: based on the best information we have, in the absence of any other information, what are people most likely to understand if the language in question is not available. [Caveat geopolitical]
So when when people can't supply secondary languages, is it better to:
- pick that fallback
- note that when that fallback is not supported, it will fall further back to the system default.
- or go with a system default.
I'll go ahead and merge