deno icon indicating copy to clipboard operation
deno copied to clipboard

Tweak ICU data build settings to include more locale information

Open anarchodin opened this issue 3 years ago • 8 comments
trafficstars

In looking at Deno, I discovered that its Intl seems to share Chrome's non-support for Icelandic.

Since both Node and Edge do support Icelandic, it's clearly not a limitation of V8 as such. In Chrome's case, Google appear to specifically disable support for locales that don't have a full interface translation, which seems like a silly thing to do for a browser. Does Deno inherit that butchered ICU for some reason?

anarchodin avatar Jan 01 '22 21:01 anarchodin

~~Edge also uses V8, so it should not be a V8 problem.~~ Misread: indeed not a V8 issue.

Maybe @bnoordhuis knows more?

lucacasonato avatar Jan 01 '22 21:01 lucacasonato

Deno uses the same ICU data file as Chromium, yes.

bnoordhuis avatar Jan 01 '22 22:01 bnoordhuis

Is that done with awareness that Chrome deliberately cuts out locales, or does that come as a surprise? I certainly wasn't expecting that when I first noticed that Chrome was messing up collation.

anarchodin avatar Jan 02 '22 10:01 anarchodin

@bnoordhuis Could we use the same ICU data file as Node? It seems to have more information for some languages. I thought Node was using the same ICU data file as Chromium (and thus we were using the same file as Node).

lucacasonato avatar Jan 02 '22 10:01 lucacasonato

Node uses a custom ICU build maintained by IBM. We can't use that (well... we could but not really) because the data format isn't stable across ICU releases. Sooner or later we'd be barred from upgrading V8 until Node's ICU catches up.

edit: it's theoretically possible to fork Chromium's ICU and tweak its build to include more locales but ICU's build is... involved... I expect it's a (ongoing, not one-off) time sink.

bnoordhuis avatar Jan 02 '22 11:01 bnoordhuis

Ok, that's unfortunate. I'll put this on the backlog for some rainy day (rainy week?) then.

lucacasonato avatar Jan 02 '22 11:01 lucacasonato

I had a reason to look into this again. The official ICU4C tarball contains a file with all of the data, measuring 30 MB against the Chromium file's 10 or so. I altered core/runtime.rs to embed icudt71l.dat (from ICU4C) instead of icudtl.dat (from Chromium, I presume). As far as I can tell, the missing locales are supported in this binary. I recognise that this is a sizeable hit, but it does suggest that the solution might not be a major time sink.

anarchodin avatar Oct 04 '22 12:10 anarchodin

In addition to the fact that some locales are missing entirely, the English locale also doesn't contain all display values. The following sample works as epected in Firefox and Node.js but fails for the second case in Deno (v1.42.4) and Chromium:

const languageNames = new Intl.DisplayNames("en", {
	type: "language",
	languageDisplay: "standard",
});
languageNames.of("eng"); // "English"
languageNames.of("run"); // "rn" (canonical code) instead of "Rundi" in Deno

kellnerd avatar May 26 '24 15:05 kellnerd