ecma402
ecma402 copied to clipboard
Detecting directionality from script type (or from a language associated with a script)
In order to set default text directionality, I think it would be helpful to have a means of mapping scripts (as by ISO 15924 code), though even more usefully, as languages with a specific "Suppress Script" in the IANA registry (indicating a predominant association of that language with a given script) to the script's directionality (RTL, LTR, Inherited, T-to-B, or Varies). One could then have a standard means (using HTML dir
with RTL languages or CSS writing modes for T-to-B ones) of programmatically setting directionality for locales or for providing excerpts of other language-encoded content (when directionality information was not also present).
That's issue #46. At Mozilla we landed mozIntl.getLocaleInfo
API which has the following signature:
let data = mozIntl.getLocaleInfo('ar');
data === {
'locale': 'ar',
'direction': 'rtl'
};
and in the future we plan to add more bits to it.
Great, thanks! If one just wanted to know the script's directionality, one could just add the 4-letter script code to any language I presume, e.g., en-Arab
would give rtl
for Arabic's direction (if one didn't happen to know what language that script was associated with), right? Not critical, but I'm just curious if it'd allow detection of any ISO 15924 scripts besides the two-letter language (i.e., not writing system) codes.
Btw, I don't see mozIntl
in Nightly (desktop)--is it behind a flag?
I see a bunch of things we can do here:
- Expose a mapping from scripts to direction (as @brettz9 is asking for)
- Expose a direct mapping from locales to directions (as @zbraniecki is suggesting)
- Expose likely subtags for locales, to get the script out, so it's possible to go from locale to script using 1.
- Expose the directionality properties of individual code points used by the BiDi algorithm (not sure if anyone wants this).
To get the direction, with an API based on Intl.Locale, with the data coming from the script as in 3, can I suggest something like this?
new Intl.Locale('ar').withLikelySubtags().direction
withLikelySubtags
would presumably be an array then? The IANA registry's "Suppress Script" field (i.e., indicating that the script should generally not be added with the language code given that the language is generally written in that script as it is) only lists one script (which stands to reason in that context), but I suppose an array sorted by likelihood would be all the merrier if the data is available and that's what you mean.
1 with 3 sound excellent to me (and if 2 was exposed as well, wfm). As far as 4, I don't currently have a need for this myself, but FWIW, it was brought up at https://github.com/tc39/ecma402/issues/90#issuecomment-225251032 .
I was imagining that withLikelySubtags
would give you a new Intl.Locale
instance, based on CLDR's likely subtags data. Unfortunately, though, this just provides one script. Is there a good data source we should look into exposing to get this sort of array?
(To continue the story: That Locale would have a particular direction
derived from its script
, rather than script
being undefined and direction
throwing an error because of the missing script).
I see.
No, I was just thinking that since you mentioned the API as withLikelySubtags
in the plural that you knew of a data source with multiple subtags for each language. I don't know of any source for that. But I guess if it ever turned up and there was desire for it, withPossibleSubtags
could presumably be added.
Re: locale having a direction
derived form its script, do you mean direction
derived from its language when no script is explicit? If so, SGTM...
The subtags here are the region and script :/
Ahh, gotcha... Ok, no worries--I think that should work... While it's helpful to try to accommodate possible use cases, I think a single script association should cover the most critical cases...
I think a single script association should cover the most critical cases
To use that, the script for a given locale should be obtained in one way or another, shouldn't it? I'm afraid average developers are less familiar with script than locale/language. So, I think what @littledan suggested may work better.
Note: we ended up naming the withLikelySubtags
method as maximize
in https://github.com/tc39/proposal-intl-locale/pull/30
I intend to address this issue by proposing https://github.com/FrankYFTang/proposal-intl-locale-info/
@sffc Now that the Locale info proposal is Stage 3, can we close this? Or perhaps we would want to close on Stage 4? At the very least we should drop the "user preferences" tag I feel.
Agree w/ @ryzokuken
I dropped the label, but I prefer to keep these issues open until the proposal reaches Stage 4 and is actually merged into the standard.
This is being addressed by the Intl Locale Info Proposal
https://github.com/tc39/proposal-intl-locale-info?tab=readme-ov-file#text-information