ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Detecting directionality from script type (or from a language associated with a script)

Open brettz9 opened this issue 6 years ago • 16 comments

In order to set default text directionality, I think it would be helpful to have a means of mapping scripts (as by ISO 15924 code), though even more usefully, as languages with a specific "Suppress Script" in the IANA registry (indicating a predominant association of that language with a given script) to the script's directionality (RTL, LTR, Inherited, T-to-B, or Varies). One could then have a standard means (using HTML dir with RTL languages or CSS writing modes for T-to-B ones) of programmatically setting directionality for locales or for providing excerpts of other language-encoded content (when directionality information was not also present).

brettz9 avatar Dec 12 '17 12:12 brettz9

That's issue #46. At Mozilla we landed mozIntl.getLocaleInfo API which has the following signature:

let data = mozIntl.getLocaleInfo('ar');

data === {
  'locale': 'ar',
  'direction': 'rtl'
};

and in the future we plan to add more bits to it.

zbraniecki avatar Dec 12 '17 15:12 zbraniecki

Great, thanks! If one just wanted to know the script's directionality, one could just add the 4-letter script code to any language I presume, e.g., en-Arab would give rtl for Arabic's direction (if one didn't happen to know what language that script was associated with), right? Not critical, but I'm just curious if it'd allow detection of any ISO 15924 scripts besides the two-letter language (i.e., not writing system) codes.

brettz9 avatar Dec 12 '17 23:12 brettz9

Btw, I don't see mozIntl in Nightly (desktop)--is it behind a flag?

brettz9 avatar Dec 13 '17 00:12 brettz9

I see a bunch of things we can do here:

  1. Expose a mapping from scripts to direction (as @brettz9 is asking for)
  2. Expose a direct mapping from locales to directions (as @zbraniecki is suggesting)
  3. Expose likely subtags for locales, to get the script out, so it's possible to go from locale to script using 1.
  4. Expose the directionality properties of individual code points used by the BiDi algorithm (not sure if anyone wants this).

To get the direction, with an API based on Intl.Locale, with the data coming from the script as in 3, can I suggest something like this?

new Intl.Locale('ar').withLikelySubtags().direction

littledan avatar Dec 13 '17 17:12 littledan

withLikelySubtags would presumably be an array then? The IANA registry's "Suppress Script" field (i.e., indicating that the script should generally not be added with the language code given that the language is generally written in that script as it is) only lists one script (which stands to reason in that context), but I suppose an array sorted by likelihood would be all the merrier if the data is available and that's what you mean.

1 with 3 sound excellent to me (and if 2 was exposed as well, wfm). As far as 4, I don't currently have a need for this myself, but FWIW, it was brought up at https://github.com/tc39/ecma402/issues/90#issuecomment-225251032 .

brettz9 avatar Dec 14 '17 00:12 brettz9

I was imagining that withLikelySubtags would give you a new Intl.Locale instance, based on CLDR's likely subtags data. Unfortunately, though, this just provides one script. Is there a good data source we should look into exposing to get this sort of array?

(To continue the story: That Locale would have a particular direction derived from its script, rather than script being undefined and direction throwing an error because of the missing script).

littledan avatar Dec 14 '17 00:12 littledan

I see.

No, I was just thinking that since you mentioned the API as withLikelySubtags in the plural that you knew of a data source with multiple subtags for each language. I don't know of any source for that. But I guess if it ever turned up and there was desire for it, withPossibleSubtags could presumably be added.

Re: locale having a direction derived form its script, do you mean direction derived from its language when no script is explicit? If so, SGTM...

brettz9 avatar Dec 14 '17 01:12 brettz9

The subtags here are the region and script :/

littledan avatar Dec 14 '17 01:12 littledan

Ahh, gotcha... Ok, no worries--I think that should work... While it's helpful to try to accommodate possible use cases, I think a single script association should cover the most critical cases...

brettz9 avatar Dec 14 '17 01:12 brettz9

I think a single script association should cover the most critical cases

To use that, the script for a given locale should be obtained in one way or another, shouldn't it? I'm afraid average developers are less familiar with script than locale/language. So, I think what @littledan suggested may work better.

jungshik avatar May 07 '18 19:05 jungshik

Note: we ended up naming the withLikelySubtags method as maximize in https://github.com/tc39/proposal-intl-locale/pull/30

littledan avatar May 10 '18 08:05 littledan

I intend to address this issue by proposing https://github.com/FrankYFTang/proposal-intl-locale-info/

FrankYFTang avatar Aug 05 '20 00:08 FrankYFTang

@sffc Now that the Locale info proposal is Stage 3, can we close this? Or perhaps we would want to close on Stage 4? At the very least we should drop the "user preferences" tag I feel.

ryzokuken avatar May 28 '21 17:05 ryzokuken

Agree w/ @ryzokuken

FrankYFTang avatar Jun 02 '21 18:06 FrankYFTang

I dropped the label, but I prefer to keep these issues open until the proposal reaches Stage 4 and is actually merged into the standard.

sffc avatar Jun 04 '21 23:06 sffc

This is being addressed by the Intl Locale Info Proposal

https://github.com/tc39/proposal-intl-locale-info?tab=readme-ov-file#text-information

sffc avatar May 02 '24 23:05 sffc