icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add NaiveFallback

Open zbraniecki opened this issue 1 year ago • 18 comments

In relation to #2683.

The current fallback mechanism is quite costly in binary size and data payload to support. In order to enable customers to use ICU4X without it, we by default do not deduplicate data that would rely on runtime fallbacking.

I believe we can (in a true Rust fashion!) resolve this dychotomy by introducing The Third Way between no fallback and full fallback. I dubbed it naive fallback.

Naive fallback works only one way - minimizing tags, and contains a very short list of exceptions.

The algorithm works like this:

  1. Take the requested locale and cut out everything except of language-script-region pair.
  2. Check if there is a match.
  3. If not, cut out region.
  4. Check if there is a match.
  5. If not, check if a given language-region pair is in exception list. 5.1 If it is, use language-script from that exception 5.2 If not, remove script.
  6. Check if there is a match.
  7. If not, use und

This will cater to exceptions in sr and zh, but not much more. For everything else it will just cut off from right to left and eventually fallback on und.

The algorithm is super small, the data is super small (maybe even baked in by default?) and if used in datagen+runtime allows us to cut out huge portion of locales which in turn reduces the number of keys in the key table in data payload. This reduction has two benefits:

  1. It makes the data payload smaller, which is especially noticable in smaller payloads (decimal/symbols is predominantly keys)
  2. It makes the runtime locale selection faster which reduces the cost of constructor.

zbraniecki avatar Sep 29 '22 19:09 zbraniecki