`Cldr.AcceptLanguage.best_match` not returning nearest locale
Hi @kipcole9. First of all, thanks for the great lib!
We are using cldr for matching "accept-language" headers to locales which we support, so that we can translate content properly (via gettext), specifically, the best match function.
The issue we are facing is that some locales which we do not support, are not falling back to nearest locales. For example, given that we support "es_ES", "en_US", "zh_CN", "zh_HK":
gettext_locale_name = fn locale ->
{:ok, tag} = MyApp.Cldr.AcceptLanguage.best_match(locale)
tag.gettext_locale_name
end
iex> gettext_locale_name.("es-ES")
"es_ES"
iex> gettext_locale_name.("es-US")
nil # should be "es_ES"
iex> gettext_locale_name.("en-AU")
"en" # should be "en_US"
iex> gettext_locale_name.("zh-Hans")
"zh_CN"
iex> gettext_locale_name.("zh-Hant")
nil # should be "zh_HK"
I couldn't find a function in cldr which improves on this.
A workaround is stripping off the variant suffix and going through the function again with the top level language, e.g. "es". Alternatively for Chinese variants we can look at the script. But wondering if there is a less manual way to do this.
In case we misconfigured something, our config is:
defmodule MyApp.Cldr do
use Cldr,
otp_app: :my_app,
gettext: MyApp.Gettext,
providers: [],
locales:
MyApp.Gettext
|> Gettext.known_locales()
|> Enum.map(&String.replace(&1, "_", "-"))
end
Tagging my colleague @andreyuhai
Thanks for the kind words, its much appreciated.
I regret that it's unlikely that MyApp.Cldr.AcceptLanguage.best_match/1 is going to satisfy your requirements - at least in its current form. CLDR locale names have no way to fallback to locale names it doesn't know about. And it doesn't know about en-US or es-ES or zh-HK.
How CLDR names locales
This is because in CLDR there is no such locale as en-US, es-ES or zh-Hans. CLDRs locale naming is that the territory with the most native speakers of that language has a plain undecorated locale ID. For example:
iex(1)> Cldr.validate_locale!("en-US").cldr_locale_name
:en
iex(2)> Cldr.validate_locale!("es-ES").cldr_locale_name
:es
es-US will fallback to es, not es-ES
Note that the resolved locale will, in ex_cldr, still have the territory set to US because that is valid. But the underlying CLDR locale name is es, not es-ES.
iex(1)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("es-US")
{:ok, MyApp.Cldr.Locale.new!("es-US")}
iex(2)> locale.cldr_locale_name
:es
That means that the best match - which is the best match to a CLDR Locale ID - can never match en-US, es-ES. Or even pt-BR.
In your other example:
en-AUwill fall back toenby the same rules.enis the CLDR locale name foren-US.zn-Hantexists as a CLDR locale name but its not configured in your CLDR backend. If it was, then it would resolve to that name (which by the same rules would resolve to effectivelyzh-Hant-TW. In CLDR there is bothzh-Hant-HKandzh-Hans-HK.
And the one that surprises me:
Your example shows:
iex> gettext_locale_name.("zh-Hans")
"zh_CN"
But I see:
iex(8)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-Hans")
{:ok, MyApp.Cldr.Locale.new!("zh-Hans-CN")}
iex(9)> locale.cldr_locale_name
:zh
Which is what I expected. Again, but the rules, zh has what CLDR calls "likely subtags" and they include Hans for the script and CN for the territory. You can see what the likely subtags are by:
iex(10)> Cldr.Locale.likely_subtags("zh")
#Cldr.LanguageTag<zh-Hans-CN [parsed]>
Possible path forward
The most immediate path forward I see, if you want to use MyApp.Cldr.AcceptLanguage.best_match/1, is to rename your Gettext locales to be consistent with the CLDR locale naming conventions.
Next steps
I am curious about how a best match for zh-Hans cam to be zh-CN since there is no such locale ID so I need to investigate that further - unless perhaps that was a copy/paste error on your side?
Are you able to share what the output of:
MyApp.Cldr.known_locale_names()MyApp.Cldr.__cldr__(:config)- `Gettext.known_locales(MyApp.Gettext)
Look like?
This is because in CLDR there is no such locale as
en-US,es-ESorzh-Hans. CLDRs locale naming is that the territory with the most native speakers of that language has a plain undecorated locale ID. For example:iex(1)> Cldr.validate_locale!("en-US").cldr_locale_name :en iex(2)> Cldr.validate_locale!("es-ES").cldr_locale_name :es
I can see from the linked repo, we do indeed have es_ES, en_US but not zh_Hans. But those files have basically no content, which I suppose is what you also mean by no locale.
However, I was under the impression that CLDR used BCP47 locale formats (which is the standard we follow), or at least that a mapping between the two here would happen given that the "accept-language" header uses this format.
es-US will fallback to
es, notes-ESNote that the resolved locale will, in
ex_cldr, still have the territory set toUSbecause that is valid. But the underlying CLDR locale name ises, notes-ES.iex(1)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("es-US") {:ok, MyApp.Cldr.Locale.new!("es-US")} iex(2)> locale.cldr_locale_name :es
That means that the best match - which is the best match to a CLDR Locale ID - can never match
en-US,es-ES. Or evenpt-BR.
Aha, maybe that's where we misunderstood. Because we were assuming that, "given this list of locales we want to support, we want the closest match to be returned from this list".
In your other example:
* `en-AU` will fall back to `en` by the same rules. `en` _is_ the CLDR locale name for `en-US`. * `zn-Hant` exists as a CLDR locale name but its not configured in your CLDR backend. If it was, then it would resolve to that name (which by the same rules would resolve to effectively `zh-Hant-TW`. In CLDR there is both `zh-Hant-HK` and `zh-Hans-HK`.
Yep, but under BCP47 I think that zh-Hant is still valid as traditional Chinese script, whose nearest match should be zh-HK.
And the one that surprises me:
Your example shows:
iex> gettext_locale_name.("zh-Hans") "zh_CN"
But I see:
iex(8)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-Hans") {:ok, MyApp.Cldr.Locale.new!("zh-Hans-CN")} iex(9)> locale.cldr_locale_name :zh
Which is what I expected. Again, but the rules,
zhhas what CLDR calls "likely subtags" and they includeHansfor the script andCNfor the territory. You can see what the likely subtags are by:iex(10)> Cldr.Locale.likely_subtags("zh") #Cldr.LanguageTag<zh-Hans-CN [parsed]>
Possible path forward
The most immediate path forward I see, if you want to use
MyApp.Cldr.AcceptLanguage.best_match/1, is to rename your Gettext locales to be consistent with the CLDR locale naming conventions.Next steps
I am curious about how a best match for
zh-Hanscam to bezh-CNsince there is no such locale ID so I need to investigate that further - unless perhaps that was a copy/paste error on your side?
Sure, let me paste full output:
iex> show_locale = fn locale -> {:ok, locale} = MarketplaceSearch.Cldr.AcceptLanguage.best_match(locale); Map.from_struct(locale) end
iex> show_locale.("zh-Hans")
%{
script: :Hans,
extensions: %{},
transform: %{},
language: "zh",
locale: %{},
backend: MarketplaceSearch.Cldr,
cldr_locale_name: :zh,
gettext_locale_name: "zh_CN",
territory: :CN,
requested_locale_name: "zh-Hans",
canonical_locale_name: "zh-Hans",
language_variants: [],
rbnf_locale_name: :zh,
language_subtags: [],
private_use: []
}
Are you able to share what the output of:
1. `MyApp.Cldr.known_locale_names()`
iex(5)> MarketplaceSearch.Cldr.known_locale_names
[:ar, :bg, :cs, :da, :de, :el, :en, :"en-CA", :"en-GB", :es, :fi, :fr, :"fr-CA",
:hr, :hu, :id, :it, :ja, :lt, :ms, :nb, :nl, :pl, :pt, :ro, :ru, :sl, :sv, :th,
:uk, :vi, :zh]
2. `MyApp.Cldr.__cldr__(:config)`
iex(6)> MarketplaceSearch.Cldr.__cldr__(:config)
%Cldr.Config{
default_locale: :en,
locales: [:ar, :bg, :cs, :da, :de, :el, :en, :"en-CA", :"en-GB", :es, :fi,
:fr, :"fr-CA", :hr, :hu, :id, :it, :ja, :lt, :ms, :nb, :nl, :pl, :pt, :ro,
:ru, :sl, :sv, :th, :uk, :und, :vi, :zh],
add_fallback_locales: false,
backend: MarketplaceSearch.Cldr,
gettext: MarketplaceSearch.Gettext,
data_dir: "/app/_build/prod/lib/marketplace_search/priv/cldr",
providers: [],
precompile_number_formats: [],
precompile_transliterations: [],
precompile_date_time_formats: [],
precompile_interval_formats: [],
default_currency_format: nil,
otp_app: :marketplace_search,
generate_docs: true,
suppress_warnings: false,
message_formats: %{},
force_locale_download: false,
https_proxy: nil
}
3. `Gettext.known_locales(MyApp.Gettext)
iex(7)> Gettext.known_locales(MarketplaceSearch.Gettext)
["ar", "bg_BG", "cs_CZ", "da_DK", "de_DE", "el_GR", "en_CA", "en_GB", "en_US",
"es_ES", "fi_FI", "fr_CA", "fr_FR", "hr_HR", "hu_HU", "id_ID", "it_IT",
"ja_JP", "lt_LT", "ms_MY", "nb_NO", "nl_NL", "pl_PL", "pt", "pt_BR", "ro_RO",
"ru_RU", "sl_SI", "sv_SE", "th_TH", "uk_UA", "vi_VN", "zh_CN", "zh_HK"]
Look like?
Thanks for the prompt response.
Yep, but under BCP47 I think that zh-Hant is still valid as traditional Chinese script, whose nearest match should be zh-HK.
zh-Hant is definitely a valid locale name and there is a CLDR locale called zh-Hant. If you configure zh-Hant-HK in your ex_cldr backend you will then see:
iex(1)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-HK")
{:ok, MyApp.Cldr.Locale.new!("zh-Hant-HK")}
# but ......
iex(2)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-Hant")
{:ok, MyApp.Cldr.Locale.new!("zh-Hant-TW")}
Which I will definitely look at. The first example is what I expected. The second example is not. I think the second example should best match also to zh-Hant-HK which I understand is your expectation too. And that should then match to your gettext locale of zh_HK
A lot of this complexity comes from having to do two kinds of matches:
- Find the best fit configured CLDR locale
- Then find its best fitting Gettext locale
Next steps
- I think your kind efforts have identified that at least
best_match/1should resolvezn-Hanttozh-Hant-HKwhenzh-Hant-HKis configured inex_cldr. And then that should resolve tozh_HKin your gettext backend. - I will also revisit the matching I'm using to find a gettext locale. I think it should be possible to match CLDR's
ento your gettexten_USgiven that the derived territory forenisUS.
its a horrible hour in my zone now so give me a few hours sleep and I'll dig into this in my morning and resolve one way or another as quickly as I can.
TLDR; (but do please read). I can do as you'd like to best match zh-Hant to zh-Hant-HK but I'm not sure its a good idea, not sure its sustainable and it definitely won't resolve that way in the upcoming localize library in 2026.
I've been thinking about this all morning, especially what the right best match is for zh-Hant. In CLDR, that is unambiguously zh-Hant-TW because without any hinting, the territory with the largest number of native zh-Hant speakers is TW.
This became more important when we get to next year and I launch localize which is basically "ex_cldr version 3.0". It will have no concept of locale configuration or backends. All CLDR locales will always be available and they'll be dynamically loaded on demand.
Therefore, if I apply hinting now so that zh-Hant will best match zh-Hant-HK if its configured, that hinting won't be useful in the future. Perhaps you'll reasonably say you don't care about that for now - you just need a solution now!
I do have enough data to be able to apply a hint based upon configuration. There is data (not very reliable, but good enough for this) to know which languages map have which primary scripts and which territories that applies to. So I can take zh-Hant and, through that data returned by Cldr.Config.language_data/0, know to check if any of the territories listed as :primary is configured and then use that for best match.
However, as I mentioned before, the data behind Cldr.Config.language_data/0 is brittle, and actually the key territory data is removed in CLDR 48 (coming end of this month). I can still derive enough for your requirement but it's brittle. And when localize comes out, this configuration hinting won't work since all locales are always available and zh-Hant will always best match to zh-Hant-TW.
Yep, but under BCP47 I think that zh-Hant is still valid as traditional Chinese script, whose nearest match should be zh-HK.
zh-Hantis definitely a valid locale name and there is a CLDR locale calledzh-Hant. If you configurezh-Hant-HKin yourex_cldrbackend you will then see:iex(1)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-HK") {:ok, MyApp.Cldr.Locale.new!("zh-Hant-HK")}
but ......
iex(2)> {:ok, locale} = MyApp.Cldr.AcceptLanguage.best_match("zh-Hant") {:ok, MyApp.Cldr.Locale.new!("zh-Hant-TW")}
Which I will definitely look at. The first example is what I expected. The second example is not. I think the second example should best match also to
zh-Hant-HKwhich I understand is your expectation too. And that should then match to your gettext locale ofzh_HK
Yep 👍 that's what we would expect
A lot of this complexity comes from having to do two kinds of matches:
1. Find the best fit configured CLDR locale 2. Then find its best fitting Gettext locale
It is a tricky challenge indeed.
Next steps
1. I think your kind efforts have identified that at least `best_match/1` should resolve `zn-Hant` to `zh-Hant-HK` when `zh-Hant-HK` is configured in `ex_cldr`. And then that should resolve to `zh_HK` in your gettext backend. 2. I will also revisit the matching I'm using to find a gettext locale. I think it should be possible to match CLDR's `en` to your gettext `en_US` given that the derived territory for `en` is `US`.its a horrible hour in my zone now so give me a few hours sleep and I'll dig into this in my morning and resolve one way or another as quickly as I can.
TLDR; (but do please read). I can do as you'd like to best match
zh-Hanttozh-Hant-HKbut I'm not sure its a good idea, not sure its sustainable and it definitely won't resolve that way in the upcominglocalizelibrary in 2026.I've been thinking about this all morning, especially what the right best match is for
zh-Hant. In CLDR, that is unambiguouslyzh-Hant-TWbecause without any hinting, the territory with the largest number of nativezh-Hantspeakers isTW.This became more important when we get to next year and I launch
localizewhich is basically "ex_cldr version 3.0". It will have no concept of locale configuration or backends. All CLDR locales will always be available and they'll be dynamically loaded on demand.
Oh nice, looking forward to it 🙌 !
Therefore, if I apply hinting now so that
zh-Hantwill best matchzh-Hant-HKif its configured, that hinting won't be useful in the future. Perhaps you'll reasonably say you don't care about that for now - you just need a solution now!I do have enough data to be able to apply a hint based upon configuration. There is data (not very reliable, but good enough for this) to know which languages map have which primary scripts and which territories that applies to. So I can take
zh-Hantand, through that data returned byCldr.Config.language_data/0, know to check if any of the territories listed as:primaryis configured and then use that for best match.However, as I mentioned before, the data behind
Cldr.Config.language_data/0is brittle, and actually the key territory data is removed in CLDR 48 (coming end of this month). I can still derive enough for your requirement but it's brittle. And whenlocalizecomes out, this configuration hinting won't work since all locales are always available andzh-Hantwill always best match tozh-Hant-TW.
So if I understand correctly, even with localize we are unlikely to find the mapping we need here via cldr between BCP47 locales and gettext keys? We would need to either modify our translation file names or adjust the incoming locales?
@hugomorg I've been reflecting on tis a while and I think there is a reasonable compromise I can implement.
- When resolving CLDR locales, and doing a best match,
ex_cldrshould continue to use CLDR best match rules. That means the a best match forzh-Hantwill resolve tozh-Hant-TWfor the CLDR locale. - However, when matching to a gettext locale, I can be more liberal and
zh-Hant-TWcan aim to best match with a gettext locale ofzh_HK. The only tricky thing is to work out how the data can make that work. I think the mapping data can be resolved in CLDR but I need to work on that.
For me, the good news is that approach remains compliant with CLDRs spec while at the same time still being able to match more liberally with Gettext locales.
You've mentioned a couple of times that a best match for zh-Hant should be possible to zh-HK (really, zh-Hant-HK I'd say but Hant is the primary script for HK in CLDR as you'd expect.
Do you have a reference for that? I haven't come across that path in the CLDR spec. And it may just be that my brain is fried from decoding TR35 for the last 8 years!
If this approach is ok with you I should be able to get something testable done by the weekend if not before.
@hugomorg I've been reflecting on tis a while and I think there is a reasonable compromise I can implement.
1. When resolving CLDR locales, and doing a best match, `ex_cldr` should continue to use CLDR best match rules. That means the a best match for `zh-Hant` will resolve to `zh-Hant-TW` for the **CLDR** locale. 2. However, when matching to a gettext locale, I can be more liberal and `zh-Hant-TW` can aim to best match with a gettext locale of `zh_HK`. The only tricky thing is to work out how the data can make that work. I think the mapping data can be resolved in CLDR but I need to work on that.For me, the good news is that approach remains compliant with CLDRs spec while at the same time still being able to match more liberally with Gettext locales.
I think I may be misunderstanding then.
I thought that the BCP47 standard was more compatible with CLDR than seems to be case based on what you are suggesting (and no doubt you know more about these standards than I)!
You've mentioned a couple of times that a best match for
zh-Hantshould be possible tozh-HK(really,zh-Hant-HKI'd say butHantis the primary script for HK in CLDR as you'd expect.Do you have a reference for that? I haven't come across that path in the CLDR spec. And it may just be that my brain is fried from decoding TR35 for the last 8 years!
If this approach is ok with you I should be able to get something testable done by the weekend if not before.
Thanks for considering a change here. In our case, I thought that zh-HK and es-US would be regarded as valid locale ids (at least according to BCP47) due to the {language_code}-{region_code} syntax. And I also assumed zh-Hant should essentially be equivalent to zh-TW, but the nearest neighbour after that should be zh-HK. I'm curious, is there any particular algorithm you are using to match these under the hood?
I thought that the BCP47 standard was more compatible with CLDR than seems to be case based on what you are suggesting (and no doubt you know more about these standards than I)!
Definitely compatible. CLDR is mostly compliant with BCP 47 locale IDs. And your locale names are definitely BCP 47 compliant and compatible with CLDR. No issue there. The full description of conformance is here. BTW, my understanding is that BCP 47 Locale IDs use - not _ but ex_cldr doesn't care which one you use.
Cldr.AcceptLanguage.best_match/1 is primarily focused on returning a language tag that most closely matches a CLDR locale configured in the system. To do that, the process is roughly:
- Canonicalise each of the potential locale ID from the
Accept-Languageheader usingCldr.Locale.canonical_language_tag/2. The implementation mostly follows this. The overall process involves parsing the resolving aliases and applying likely subtags. I'm pretty confident this code is correct (per the spec) given it passes the conformance tests - nearly 2_000 of them. - If the resulting language tag has the field
:cldr_locale_namefilled in the it becomes a candidate to be chosen - Sort the candidates by their
qscore and pick the one with the highest score.
The next question is "how does the :cldr_locale_name" get filled in? The primary process follows Language Matching in TR35. But the implementation does take a few shortcuts and this conversation will definitely prompt me to revisit. Nevertheless, from a CLDR perspective, the language matching is quite robust in finding the most appropriate :cldr_locale_name for a given locale ID.
Then lastly, the question is how do we link to a Gettext locale ID. Not surprisingly this isn't a CLDR concern and the implementation is logical but ad hoc. Basically it does a reductive check of the combinations of language, script and region and checks that against Gettext locale names. Hence why I believe there is room to improve and be able to match a :cldr_locale_name of zh-Hant to a gettext locale of zh_HK and en to en_US. I've worked on a PoC for that tonight and I think I can have something for you to on Saturday (I won't have much time on Friday to work on this).
I spent the weekend mapping out how to do this and I've got a good plan. And I found some good test data trolling the CLDR repo. I just need a few days to implement - I'm confident it will resolve this issue in a way that works well for your use case.
I spent the weekend mapping out how to do this and I've got a good plan. And I found some good test data trolling the CLDR repo. I just need a few days to implement - I'm confident it will resolve this issue in a way that works well for your use case.
Hi @kipcole9 apologies for the delay, I was off for a few days. Thank you for continuing to look into this.
Would this also help with those "non-conventional" locales, e.g. mapping es-US to es-ES (if only the latter has been registered as a locale)?
Apologies for the delay - I was stuck finishing up some significant work on ex_cldr_dates_times which is now done.
I have pushed two commits (e1d175c0e038a81de158d11b936292eacae5e391 and e655ef6296bcb391e06327cb8dd9138ee54c1fb5) that implement the CLDR language matching algorithm and it is returning (as expected) much better results. For example:
iex> Cldr.Locale.Match.best_match("zh-HK"),
...> supported: ["zh", "zh-Hans", "zh-Hant", "en", "fr", "en-Hant"]
[{"zh-Hant", 10}, {"zh", 59}, {"zh-Hans", 59}, {"en-Hant", 89}]
It also improves locale matching for your es-US versus es-MX question:
iex> Cldr.Locale.Match.best_match "es-US", supported: ["es-ES", "es-MX", "es-AR"]
[{"es-MX", 9}, {"es-AR", 9}, {"es-ES", 10}]
Note that in this example, es-MX matches more closely with es-US than es_ES. This is because the Spanish spoken in the Americas has greater affinity than with the Spanish spoken in Europe.
You'll see similar affinity in the English variations too:
iex> Cldr.Locale.Match.best_match "en-AU", supported: ["en-CA", "en", "en-GB"]
[{"en-GB", 8}, {"en-CA", 10}, {"en", 10}]
Here, en-GB is considered a closer match than en (meaning en-US).
I have a small amount of work still to go on this but it's close now. I will post here when I have a version you can try.
Primarily I need to apply this matching to the gettext_locale_name field in a language tag and finalise some work on what CLDR calls "paradigm locales" which I don't fully understand yet.
If you're up for an early test, I've pushed a release candidate to GitHub. You can configure it by:
{:ex_cldr, github: "elixir-cldr/cldr48"}
You should find that the :gettext_locale_name is populated as you expect. If not - it's a bug on my side.
Note that other ex_cldr_* libraries need updating to work with this version of ex_cldr. In particular current hex ex_cldr_dates_times most likely won't work. There are a lot of updates coming to various libs to support CLDR48. All of which should be done in the next week.
I should have noted there are still 12 test cases on locale matching that are failing (out of 125 or so) so I still have some work to do on the implementation but I believe the current code covers the configuration you described for your use case.
Hey @kipcole9, thank you. This already looks like a big difference for us! I will do more testing but results below look solid.
I noticed that the underscore has turned into a hyphen but this is minor.
Setup:
gettext_locale_name = fn locale ->
{:ok, tag} = MarketplaceSearch.Cldr.AcceptLanguage.best_match(locale)
tag.gettext_locale_name
end
Enum.each(["es-ES", "es-US", "en-AU", "zh-Hans", "zh-Hant"], fn locale ->
IO.puts("#{locale} -> #{gettext_locale_name.(locale) || "nil"}")
end)
Before:
es-ES -> es_ES
es-US -> nil
en-AU -> en
zh-Hans -> zh_CN
zh-Hant -> nil
After:
es-ES -> es-ES
es-US -> es-ES
en-AU -> en-GB
zh-Hans -> zh-CN
zh-Hant -> zh-HK
Thanks for the feedback, glad it shaping up. I made an effort to return a locale name that is the same format as the requested one so I'll look back into that - :gettext_locale_name needs to be the same name as gettext knows it for obvious reasons. Thanks for pointing that out. I'll have another spin ready in my early morning (UTC+11).
I've pushed an update to https://github.com/elixir-cldr/cldr48 that is, I hope, the final release candidate of ex_cldr version 2.44.0. If you have a chance, would you mind mix deps.update ex_cldr and testing one more time?
:gettext_locale_namesare now exactly as they are named on diskCldr.Locale.Match.best_match/2is passing 125 tests and failing on 4. These need further investigation but they are very edge cases that shouldn't prevent publishing the new verison- A bug in locale resolution has been fixed for locales with the language code
und. Not a language code which would occur in normal use.
I aim to get new versions of the several ex_cldr_* libs that are updated for CLDR 48 published by Monday 3rd.
Hey @kipcole9, with {:ex_cldr, github: "elixir-cldr/cldr48"} I'm now getting an error. I haven't changed any code or translations.
** (Cldr.UnknownLocaleError) Failed to install the locale named "bg-BG". The locale name is not known.
(ex_cldr 2.44.0-rc.6) lib/cldr/install.ex:93: Cldr.Install.do_install_locale_name/3
(elixir 1.16.2) lib/enum.ex:987: Enum."-each/2-lists^foreach/1-0-"/2
(ex_cldr 2.44.0-rc.6) lib/cldr/install.ex:29: Cldr.Install.install_known_locale_names/1
(ex_cldr 2.44.0-rc.6) lib/cldr.ex:102: Cldr.install_locales/1
(ex_cldr 2.44.0-rc.6) expanding macro: Cldr.Backend.Compiler.__before_compile__/1
lib/marketplace_search/cldr.ex:1: MarketplaceSearch.Cldr (module)
@hugomorg apologies, thats not great - and not expected. I'm not even sure where bg-BG comes from because your gettext locale is bg_BG I assume? Working on this now.
@kipcole9 thanks for the quick reply.
The hyphen was appearing so I could transform the gettext names:
defmodule MarketplaceSearch.Cldr do
@moduledoc """
CLDR configuration module for MarketplaceSearch.
"""
use Cldr,
otp_app: :marketplace_search,
gettext: MarketplaceSearch.Gettext,
providers: [],
# Using locales with a hyphen works with CLDR, but when CLDR maps them back to a gettext locale,
# it uses the underscore as a separator. Meanwhile, underscored locales passed straight to CLDR
# don't work. So we need to do the transform here.
locales:
MarketplaceSearch.Gettext
|> Gettext.known_locales()
|> Enum.map(&String.replace(&1, "_", "-"))
end
But when I comment out the replace, I get a similar error
== Compilation error in file lib/marketplace_search/cldr.ex ==
** (Cldr.UnknownLocaleError) Failed to install the locale named :bg_BG. The locale name is not known.
(ex_cldr 2.44.0-rc.6) lib/cldr/install.ex:93: Cldr.Install.do_install_locale_name/3
(elixir 1.16.2) lib/enum.ex:987: Enum."-each/2-lists^foreach/1-0-"/2
(ex_cldr 2.44.0-rc.6) lib/cldr/install.ex:29: Cldr.Install.install_known_locale_names/1
(ex_cldr 2.44.0-rc.6) lib/cldr.ex:102: Cldr.install_locales/1
(ex_cldr 2.44.0-rc.6) expanding macro: Cldr.Backend.Compiler.__before_compile__/1
lib/marketplace_search/cldr.ex:1: MarketplaceSearch.Cldr (module)
Thanks for the clarity, that helps.
I think the actual issue is that I have overlooked matching gettext locale names to CLDR locale names. Meaning there is no 'bg-BG' in CLDR, just bg. And I haven't added the code to more flexibly match to a CLDR locale name.
And now I'm wondering how it ever worked with a bg_BG locale. Did you ever get a message similar to:
The locale bg_BG is configured in the gettext backend but is unknown to CLDR. ......
Yep, saw plenty of those warnings :)
Compiling lib/marketplace_search/gettext.ex (it's taking more than 10s)
note: The locales ["bg_BG", "cs_CZ", "da_DK", "de_DE", "el_GR", "en_US", "es_ES", "fi_FI", "fr_FR", "hr_HR", "hu_HU", "id_ID", "it_IT", "ja_JP", "lt_LT", "ms_MY", "nb_NO", "nl_NL", "pl_PL", "pt_BR", "ro_RO", "ru_RU", "sl_SI", "sv_SE", "th_TH", "uk_UA", "vi_VN", "zh_CN", "zh_HK"] are configured in the MarketplaceSearch.Gettext gettext backend but are unknown to CLDR. They will not be used to configure CLDR but they will still be used to match CLDR locales to Gettext locales at runtime
I've pushed a commit and updated the elixir-cldr/cldr48 repo to include code that now better matches gettext locale names to CLDR locale names. It uses a simple method (for now) of just suffix stripping to find a match. I think that works for your use case (and others). It means bg_BG will configure :bg in CLDR and en_US will configure :en. Etc etc. I added additional testing for this as well.
You've been very patient with this which is greatly appreciated. This release will definitely be more solid in several areas as a result of your collaboration.
Would you mind updating one more time and confirming its ok (or not!).
With this last commit the contract now is:
:gettext_locale_nameis the same as the name on disk - whatever that was.- Gettext locale names will be matched to a CLDR locale name by repeated suffix stripping to try and find a match.
- CLDR locale names will be matched to a Gettext locale name by using the new
Cldr.Locale.Match.best_match/2function.
In the future, Cldr.Locale.Match.best_match/2 will be used for all locale matching. Probably not until localize 0.1.0 in the new year.
Hey @kipcole9 sorry for the delay - I was away for a while. Will test this early next week.
Hey @kipcole9, repeated my test above and it's looking good:
es-ES -> es_ES
es-US -> es_ES
en-AU -> en_GB
zh-Hans -> zh_CN
zh-Hant -> zh_HK
One more question: is there a reason why "en-US" resolves to "en" instead of "en-US"? This doesn't seem to be the case for other locales.
iex(18)> gettext_locale_name.("en-US")
"en"
iex(20)> gettext_locale_name.("en-CA")
"en_CA"
iex(23)> Enum.all?(["en_US", "en_CA"], & &1 in Gettext.known_locales(MarketplaceSearch.Gettext))
true
In CLDR, en and en-US are synonymous (en is expanded to en-US). But due to some special handling of of paradigm locales, en will be preferred over en-US.
However, I think you would only see this as an issue if you have both en and en_US gettext locales? You can experiment using Cldr.Locale.Match.best_match/2 like this:
# Does en-US match with the gettext locale `en_US`? Yes.
iex(> Cldr.Locale.Match.best_match "en-US", supported: ["en_US"]
{:ok, "en_US", 0}
# What if we support both `en` and `en_US`? `en` will win, even though they
# both match.
iex> Cldr.Locale.Match.best_match "en-US", supported: ["en", "en_US"]
{:ok, "en", 0}
Is there some chance you have both en and en_US gettext locales?