icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Implement LocaleMatcher

Open sffc opened this issue 2 years ago • 9 comments

We should probably implement the CLDR/ICU LocaleMatcher algorithm in ICU4X.

This subject has previously been a point of contention; I'm hoping that we can at least implement the CLDR algorithm and leave open the opportunity for future improvements to it.

Note: There is an open proposal in ECMA-402 that has not recently been active.

@zbraniecki

sffc avatar Jan 25 '23 16:01 sffc

What's the purpose of it?

My initial position is that ICU LocaleMatcher is insufficient and limited, and instead we should:

  • Provide building blocks for people to build locale negotiation libraries
  • Consider investing in building a better locale negotiation logic if we feel like we want to step in

We're doing it for a number of components that are due for a complete rewrite and I believe Locale Matching belongs to that list.

zbraniecki avatar Jan 25 '23 16:01 zbraniecki

@zbraniecki Here is the call site in Fuchsia:

https://cs.opensource.google/fuchsia/fuchsia/+/main:src/lib/intl/lookup/rust/src/lib.rs;l=353

CC @filmil

sffc avatar Jan 25 '23 17:01 sffc

Exciting! So, what this code does is a variant of https://github.com/projectfluent/fluent-langneg-rs/ using NegotiationStrategy::Filtering. Notice also, that NegotiationStrategy::Lookup is a drop-in replacement for direct ICU LocaleMatcher call, in case someone really wants to use that.

Unfortunately, because it uses the ICU API which is badly suited for this task, it performs a lot of unnecessary allocations. The equivalent code in fluent-langneg is here.

I am generally confident that Fluent langneg is a pure improvement over ICU LocaleMatcher from the API and algo standpoint, but I'm not as strongly opinionated as to whether it has to be the locale negotiation algorithm for all use cases.

For that reason, wearing the conservative hat, I suggest we develop icu4x-langneg API based on fluent-langneg as a standalone library (or experimental?) and ensure that all ICU APIs required to power it (maximize/minimize/parentlocales etc.) are available. Wearing a less conservative hat, I suggest we just take fluent-langneg and put it in ICU4X.

zbraniecki avatar Jan 25 '23 17:01 zbraniecki

Is there a test corpus of:

  1. ICU LocaleMatcher that I could run the NegotiationStrategy::Lookup to verify compatibility
  2. Fuschia's fallback-matching that I could run against NegotiationStrategy::Filtering to verify compatibility ?

zbraniecki avatar Jan 25 '23 17:01 zbraniecki

Adding needs-approval so @filmil can weigh in

sffc avatar Jan 25 '23 17:01 sffc

Happy to do a deep dive on that if you think it's getting urgent.

zbraniecki avatar Jan 25 '23 18:01 zbraniecki

@zbraniecki points out that there is a Rust locale matcher crate that uses icu_locid, which may be an option for clients like Fuchsia.

sffc avatar Feb 02 '23 18:02 sffc

What's the progress of this feature? It's very useful and necessary when handling i18n.

Berrysoft avatar Jun 25 '23 16:06 Berrysoft

I'm working on a design doc for this

zbraniecki avatar Oct 23 '25 16:10 zbraniecki