cldrjs icon indicating copy to clipboard operation
cldrjs copied to clipboard

Retrieve available locale name

Open bryanforbes opened this issue 9 years ago • 13 comments

In the cldr-data packages is availableLocales.json. This tells users what locales are available within the data. I would like to use this to determine which locale files to load on the fly, however I don't see a way to seed the Cldr available locales or use an array of locale names to get the effective locale name based on the user's locale. Essentially, I'd like to do something like this:

Promise.all([
    request.get('path/to/cldr-data/availableLocales.json'),
    request.get('path/to/cldr-data/supplemental/likelySubtags.json'),
    request.get('path/to/cldr-data/supplemental/parentLocales.json') // not sure I need this, but including it anyway
]).then(function ([ localesResponse, subtagsResponse, parentResponse ]) {
    Cldr.load(subtagsResponse.data, parentResponse.data);
    let locale = Cldr.determineLocale(navigator.langugage, response.data.availableLocales);

    return Promise.all([
        request.get(`path/to/cldr-data/main/${locale}/numbers.json`),
        request.get(`path/to/cldr-data/main/${locale}/currency.json`)
    ]).then(function ([ numbersResponse, currencyResponse ]) {
        Cldr.load(numbersResponse.data, currencyResponse.data);
        return new Cldr(locale);
    });
}).then(function (cldr) {
    // use cldr.get()
});

Using this technique, I could use this to see the globalize data as well and create a promise-based API for loading only the locale information needed at any given time.

bryanforbes avatar Jun 09 '15 13:06 bryanforbes

Considering you are using cldr-data, you're using resolved data. Then:

Cldr.setAvailableBundlesHack = function(availableLocales) {
    availableLocales.splice(availableLocales.indexOf("root"), 1);
    this._availableBundleMapQueue = availableLocales;
};

Promise.all([
    request.get('path/to/cldr-data/availableLocales.json'),
    request.get('path/to/cldr-data/supplemental/likelySubtags.json')
]).then(function ([ localesResponse, likelySubtagsResponse ]) {
    // Load required data.
    Cldr.load(likelySubtagsResponse.data);

    Cldr.setAvailableBundlesHack(localesResponse.data.availableLocales);

    let bundle = new Cldr(navigator.langugage).attributes.bundle;

    return Promise.all([
        request.get(`path/to/cldr-data/main/${bundle}/numbers.json`),
        request.get(`path/to/cldr-data/main/${bundle}/currency.json`)
    ]).then(function ([ numbersResponse, currencyResponse ]) {
        Cldr.load(numbersResponse.data, currencyResponse.data);
        return new Cldr(locale);
    });
}).then(function (cldr) {
    // use cldr.get()
});

Considering you are using unresolved data. Then, you must load parentLocales.json data along with likelySubtags.json. Another addition to your code is that loading main/${bundle}/numbers.json is not sufficient anymore, because you need to load the complete unresolved chain. I mean loading en-IN, en-GB (parent locale lookup), en (truncate lookup) and root bundles if you want to support en-IN. The code for figuring out the parent bundles are in here https://github.com/rxaviers/cldrjs/blob/master/src/bundle/parent_lookup.js. But, this is not accessible publicly, you would have to duplicate this code in you application or having us figure out how this could be exposed.

Did I answer your question? Please, just let me know. I'm keeping this issue opened for discussion on whether .setAvailableBundles should be implemented.

rxaviers avatar Jun 09 '15 14:06 rxaviers

I think .setAvailableBundles() would be a great addition as well as .getBundleName() to avoid having to do new Cldr(navigation.langauge).attributes.bundle to get the bundle name.

bryanforbes avatar Jun 09 '15 15:06 bryanforbes

I'd like to add a vote to add this functionality, too. And to also make sure it is easily accessible from a user of globalize.

As a developer, I really don't want to have to deal with mapping my supported locales over to the proper cldr filenames; I'd love it if cldrjs/globalize would automatically tell me which files to load to support a given locale string.

The above code does seem to work for me (in node.js), although I'm not getting a match for two locale strings which should have proper mappings. Is this a CLDR data issue?

sr-Latn-CS
sr-Cyrl-CS

@rxaviers I'm a little unclear on the "unresolved data" comments above. Do I need to worry about loading all of the parent chains, as well? For example, I notice that en-US resolves to en-US-POSIX -- do I also need to manually load the en files for things to work properly?

dpolivy avatar Jul 14 '15 22:07 dpolivy

@bryanforbes and @dpolivy, I know it may sound unrelated. But, could you both share with me your development stack? I mean, how do you organize your files, what tooling do you use to bundle the built files for production? This might shed more light into how to address this problem. Thanks

rxaviers avatar Jul 14 '15 23:07 rxaviers

@rxaviers See https://github.com/jquery/globalize/issues/460 as a starting point. At the moment, I am working to globalize our server-side components built on node.js and express. We use handlebars for template rendering. I was debating between FormatJS and globalize, and decided to try globalize first because it also supported parsing (which is a need of ours to handle user form submissions).

Starting at a high level, our core site is built on Windows (Classic ASP), and as such uses Windows LCIDs for determining date/currency handling. Each user has an LCID associated with their profile. Our Mobile site/apps are built on node.js. So, step one for me is mapping Windows LCIDs to proper locale strings (and ISO currency codes). I just wrote a simple .NET application to iterate through our supported LCIDs, and output the Name and ISO Currency Code to a JSON file that maps LCIDs to those values.

In node, I am building an internal i18n module that:

  • Loads the locale JSON data I created in .NET, and creates a Globalize object for each supported locale. I want to pre-load all of this for performance reasons, and to re-use objects across requests with the same locale.
  • Exports express middleware that looks at the incoming request, and attaches an instance of Globalize (actually, a lightweight wrapper I wrote around that) to the res.locals property on the response. That way the rest of my route handlers all have access to the proper object for that request.
  • Provides a set of handlebars helpers (a la FormatJS), such as formatCurrency, formatShortDate, etc that I use in my templates, and when executed will use the res.locals.intl (Globalize) object to render out the correctly localized version of the data.

Hopefully that makes sense. The challenges I have had so far are that I just want to specify "here are the locales we support", and have globalize just automatically load the right CLDR files for me to make it work (I can do the LCID -> locale string mapping myself).

dpolivy avatar Jul 14 '15 23:07 dpolivy

Please, see my reply about Globalize-only things in here: https://gist.github.com/rxaviers/fbca34ead79440b7f16f. (cc @kborchers)

Hopefully that makes sense. The challenges I have had so far are that I just want to specify "here are the locales we support", and have globalize just automatically load the right CLDR files for me to make it work.

Thanks, it did help. Knowing your environment and that you're using Globalize and Cldrjs on the server side (on Node.js) helps shaping my answer below.

Let's first define "load the right CLDR files" as (a) for each supported locale, load the right bundle, and (b) for each bundle, load the right set of JSON files (e.g., gregorian calendar, numbers, etc).

For (b), it's easy. You'll need to load at least the files of this table https://github.com/jquery/globalize/#2-cldr-content for Globalize.

For (a), it's important to notice that cldrjs (or Globalize) doesn't do I/O to avoid diverging from its original scope (more details in https://github.com/jquery/globalize/issues/405#issuecomment-75952649). Therefore, it doesn't load the data. But, it expects developer to load the data and feed the library on it.

Having said that, I understand your problem, I will give some suggestions below, and I'm open for feedback.

On the server-side, you can

npm install cldrjs cldr-data
Cldr = require("cldrjs");
Cldr.load(require("cldr-data").entireSupplemental());
Cldr.load(require("cldr-data").entireMainFor("en", "es", "zh"));
var en = new Cldr("en");

Above, it will load the entire supplemental data plus the entire main data (calendars, numbers, etc) for the English, Spanish, and Chinese locales. (replace cldrjs by Globalize if you will)

If you have a limited set of supported locales, it's not hard to use the above code to load the data you need and then use cldrjs or Globalize.

If your supported set of locales is almost the entire CLDR (or the entire CLDR), you could:

Cldr.load(require("cldr-data").all());

Does that work for you? If not, what are your supported locales? To know what the CLDR available locales are, see https://github.com/unicode-cldr/cldr-core/blob/master/availableLocales.json. Currently, Unicode publish two different coverages: modern (the default installed with npm install cldr-data) and full (installed with CLDR_COVERAGE=full npm install cldr-data).

I'm trying to avoid the long explanation. Because, there's no direct mapping between a desirable locale and a bundle. It depends on the given available bundles. See details in https://github.com/rxaviers/cldrjs/blob/master/doc/bundle_lookup_matcher.md and I'll be happy to answer to any questions.

As you can see, there are a lot of improvements to be done in our docs and help improving them (or any other contribution) is very much appreciated :).

@rxaviers I'm a little unclear on the "unresolved data" comments above. Do I need to worry about loading all of the parent chains, as well?

Assume you're using resolved data unless you're building the data yourself (via Unicode tools). Therefore, you don't need to worry about loading the chain yourself.

For example, I notice that en-US resolves to en-US-POSIX -- do I also need to manually load the en files for things to work properly?

This is a bug that needs a fix #29.

rxaviers avatar Jul 15 '15 02:07 rxaviers

Thanks, this is really helpful! The approach to loading looks a little simpler than what I am currently doing (basically, require'ing all of the files you linked to in the docs individually). For reference, here's the current list of locales we support as a JSON object mapping the LCID to locale string and currency code. This is basically taken from .NET as I described above.

Now, it seems that I need to map my langId property into the bundle id, which I can then pass to entireMainFor, right? I used the code in your second comment here to make that work -- all of them mapped except for en-029 (en-CB) and the sr-Latn-CS/sr-Cyrl-CS. Also, I couldn't make the setAvailableBundles hack work on the Globalize object -- is that possible, so I don't have to pull in cldrjs directly?

I suppose I could just load all bundles. What's the "cost" of doing that, if they're never used? Just some startup perf and memory to keep the structures around?

dpolivy avatar Jul 15 '15 06:07 dpolivy

Now, it seems that I need to map my langId property into the bundle id, which I can then pass to entireMainFor, right? I used the code in your second comment here to make that work

Yeap, if you want to map your langId (which is actually a locale composed of a language subtag plus a region subtag) into a bundle id, the function of my second comment is your best bet, which I am still resistant on making it an addition to the library as setAvailableBundles, because developers shouldn't need to worry about setting up bundles, they should worry about loading the data. The library should figure out what to do with the bundles (as it does). The way this proposed method is being used is actually a wordaround to figure out which bundle id cldrjs would load considering all the given bundles were present. I'm still open for suggestions that better fits this objective.

-- all of them mapped except for en-029 (en-CB) and the sr-Latn-CS/sr-Cyrl-CS. Also, I couldn't make the setAvailableBundles hack work on the Globalize object -- is that possible, so I don't have to pull in cldrjs directly?

CLDR doesn't include data for en-CB or sr-Latn-CS/sr-Cyrl-CS, that's why it didn't work for you.

If you consider it's safe to have en-CB to inherit data from en (aka en-US), you could use cldrjs inheritance abilities (support to handle unresolved data; and cldrjs loaded from Node.js already has this support). You simply have to either load an empty set mainCldr.load({main: {"en-CB": {}}}) or include "en-CB" as another available bundle to the hack function of the second comment. Similarly, sr-Latn-CS would inherit from sr-Latn and sr-Cyrl-CS from sr-Cyrl the same way.

Also, I couldn't make the setAvailableBundles hack work on the Globalize object -- is that possible, so I don't have to pull in cldrjs directly?

Globalize doesn't export the cldrjs object. Because, there's no point on doing so for normal usecase. :)

rxaviers avatar Jul 15 '15 14:07 rxaviers

@dpolivy I have also added one more comment at https://gist.github.com/rxaviers/fbca34ead79440b7f16f (gist doesn't notify)

rxaviers avatar Jul 15 '15 14:07 rxaviers

I am still resistant on making it an addition to the library as setAvailableBundles, because developers shouldn't need to worry about setting up bundles, they should worry about loading the data. The library should figure out what to do with the bundles (as it does). The way this proposed method is being used is actually a wordaround to figure out which bundle id cldrjs would load considering all the given bundles were present. I'm still open for suggestions that better fits this objective.

I agree with your philosophy here, in that developers shouldn't worry about setting up bundles, and the library shouldn't do an I/O. But that gap still needs to be bridged in a way that's easily accessible to the typical developer, who may not be an i18n specialist, and just needs to check a box that says "allow date/number/currency formatting based on a user's locale". I expect most developers approach this in terms of having a list of locale strings (language-region subtags) -- or even LCIDs in the Windows case -- and just expecting it to work.

How do other companies/products that use CLDR data do the mapping from language-region subtags to bundles? Or do they just load all of the data, so it's a non-issue?

If there were a getBundleForLocale API, that might be a little less 'hacky' than the approach above. But I also think setAvailableBundles would be a workable solution.

One other thing I'm still not clear about -- will I need to load any parent bundles in addition to language specific ones? e.g., if I load en-GB, do I also need en for it to work properly? Or does that not matter since I'm using cldrjs.

I also added a comment in the gist.

dpolivy avatar Jul 15 '15 23:07 dpolivy

I know it may sound unrelated. But, could you both share with me your development stack? I mean, how do you organize your files, what tooling do you use to bundle the built files for production? This might shed more light into how to address this problem. Thanks

Sorry for the delay in getting back to you. It's been a crazy week! I and a co-worker are currently developing https://github.com/dojo/i18n to wrap Globalize to provide an automatic loading mechanism for language and localization information. The current proposed API is:

let i18n = new I18n({
    locale: 'en-GB' // this is optional; if not provided, the system locale will be used
});
i18n.load().then(() => {
    // use any functions based on CLDR data
    i18n.formatCurrency(1, 'USD' /* options object can follow */);
});
i18n.loadBundle('widgets/messages/Dialog').then((bundle) => {
    // use messages defined in the bundle
    bundle.get('close' /* variable arguments follow */);

    // or use them from i18n
    i18n.getMessage('widgets/messages/Dialog' /* variable arguments to follow */);
});

This loading mechanism is pretty simple for cldr-data since it is already resolved: fetch cldr-data/availableLocales, cldr-data/supplemental/likelySubtags, and cldr-data/supplemental/parentLocales; load the subtags and parent locales into Cldrjs; load the available locales into Cldrjs as outlined above; retrieve the user's locale as outlined above; fetch numbers.json, currencies.json, and other locale-specific JSON files as well as any supplemental JSON files; and then load the fetched data into Cldrjs. I have the first part (up to and including retrieving the user's locale) coded already. The rest should be fairly straightforward.

The loading mechanism for message bundles becomes a bit trickier. I want to allow dojo-i18n to work with both resolved and unresolved message bundles because during development, it's a bit silly to require a build to resolve all of the language bundles. In production, I would expect a build step would take care of the resolving and flattening of language bundles. We haven't worked out all of what needs to happen, but Dojo 1's mechanism did something similar.

The tricky part is determining which unresolved bundles to load. Let's say a library or application has the following locales translated and they are unresolved: root, en, en-GB, and de-AT. The project structure might look like this:

app/
    messages/
        de-AT/
            common.json
        en/
            common.json
        en-GB/
            common.json
        root/
            common.json
    main.js

The loading mechanism will obviously need a way to know which locales are translated (to be determined), but once it has that information, how can it take the user's locale and determine which bundles to fetch to load into Globalize? Cldrjs has a very well-defined mechanism for determining which unresolved data to use, but it's not public. We have had this conversation on IRC a while back and the parent_lookup routine is not exposed, so loading mechanisms need to reimplement that portion using properties marked as private (prefixed with _). Once a list of the locales from root to the user's locale is created, a comparison can be done with the available bundle locales and they can be fetched. For a user from the UK, the chain might be [ 'root', 'en' 'en-GB' ], so all three bundles need to be fetched. A user from Austria's chain would be [ 'root', 'de', 'de-AT' ], so only the root and de-AT bundles need to be loaded.

Obviously there are some other things to figure out (where the available locales for bundles get stored, how to have separate available locale lists for different bundles, etc.), but the big sticking point for me (as a developer of a loading mechanism for unresolved data) is knowing which bundles to load. You said:

because developers shouldn't need to worry about setting up bundles, they should worry about loading the data

I'd argue that developers should need to know how to set up their message bundles, but that their message bundles should Just Work™ when they say to load a named bundle. A developer shouldn't need to know which unresolved data to fetch to load into Globalize, something should do that for them. And that's the API I'm writing. I hope you can see how this would greatly benefit developers.

bryanforbes avatar Jul 16 '15 22:07 bryanforbes

@dpolivy

But that gap still needs to be bridged in a way that's easily accessible to the typical developer

I agree. I'm open to suggestions. I believe this gap will be filled by higher level libraries (perhaps with narrower scope) that uses cldrjs underneath.

How do other companies/products that use CLDR data do the mapping from language-region subtags to bundles? Or do they just load all of the data, so it's a non-issue?

The mapping from a language-region (or in a general sense, a locale) to a bundle is well defined and more details can be found here https://github.com/rxaviers/cldrjs/blob/master/doc/bundle_lookup_matcher.md.

The question about how other companies/products handle the data is a very good (and broad) question. I'd love to have input and some solid data here.

As far as I can tell there are two different worlds: implementations for backend vs. frontend, where requirements for each are quite different. Examples of implementations for backend are ICU (for C or Java) and its wrappers (for Ruby, Python, Perl, PHP, etc); and as far as I know, ICU expects CLDR data to be available under a given location and a well defined structure and it handles the loding by itself; the data is usually in the LDML format and unresolved; large space and multiple access to this data aren't problems in such environment. In the other hand, implementations for frontend have different requirements; the fewer the accesses the better, therefore CLDR data is usually preferred resolved. Some frontend implementations like the ibm-js/ecma402 polyfil handles the loading by narrowing the choices: it uses AMD and expects the CLDR data to be located in a certain way. Other frontend imlementations like the andyearnshaw/Intl.js preprocesses the data and distribute kinda compiled bundles only (https://github.com/andyearnshaw/Intl.js/blob/master/locale-data/jsonp).

Cldrjs is a CLDR traverser. Its goal is to support a wide set of different environments (including backend and frontend ones) and its goal is to abstract some of the nuances of the UTS#35 speciciation to facilitate the creation of libraries that use CLDR data.

One other thing I'm still not clear about -- will I need to load any parent bundles in addition to language specific ones? e.g., if I load en-GB, do I also need en for it to work properly? Or does that not matter since I'm using cldrjs.

Assume you don't need to worry about loading the chain yourself unless you're building the data yourself (via Unicode tools). To understand the difference between resolved vs. unresolved data: unresolved data optimizes space; resolved data optimizes access. Unresolved data avoids duplication (for example, all duplicate fields of en-GB lives in its parents: en or root).

rxaviers avatar Jul 24 '15 13:07 rxaviers

@bryanforbes, I'm very excited for dojo/i18n.

how can it take the user's locale and determine which bundles to fetch to load into Globalize? Cldrjs has a very well-defined mechanism for determining which unresolved data to use, but it's not public

Feel free to submit a new issue or PR with ideas (potentially describing a desired API) for this particular problem.

All that occurs me is either exposing parent_lookup publicly or being able to use an event for that. For example, I am wondering if cldrjs could trigger an event called "parentLookup" passing the locale as argument. So, it would be possible to figure out the parent lookup chain with the code below.

var hasFetched = {};
Cldr.on("parentLookup", function(locale) {
  if(!hasFetched[locale]) {
    fetch(locale) && hasFetched[locale] = true;
  }
});

I would expect a build step would take care of the resolving and flattening of language bundles. We haven't worked out all of what needs to happen, but Dojo 1's mechanism did something similar.

Off-topic, but this could be of interest: https://github.com/jquery/globalize/issues/436

A developer shouldn't need to know which unresolved data to fetch to load into Globalize, something should do that for them. And that's the API I'm writing. I hope you can see how this would greatly benefit developers.

I agree. Off-topic, but dojo/i18n could eventually be listed under Globalize's examples. Please, submit a PR when you have something to show.

rxaviers avatar Jul 24 '15 13:07 rxaviers