ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Standardize "Implementation-dependent behavior"

Open littledan opened this issue 7 years ago • 17 comments

The specification section Implementation Dependent Behaviour in ECMA 402 allows for implementations to choose various extensions in BCP 47 tags or options in the options bags. I think we should explicitly specify which tags and options are interpreted how, and the implementation-defined behavior should be restricted to the actual locale data which backs these options. (Alternatively, we could make a normative, unversioned reference to CLDR and require that all tags are passed through, but maybe it'd be better to think through each particular thing that's added.)

littledan avatar Oct 19 '16 09:10 littledan

@NorbertLindenberg Could you explain what went into the current design, if this is documented in any email threads, for example?

littledan avatar Oct 19 '16 11:10 littledan

In 2012, there was a strong desire from TC 39 to have every detail of the behavior of the Internationalization API specified; however, in many cases that was not possible. Section 4.3 of the Specification provides some of the reasons. In essence, we knew that the API would be implemented on top of at least two different internationalization libraries, the one in Windows and ICU, both very big and constantly evolving in response to market needs and actual changes in the real world that they reflect, and with somewhat different functionality. Limiting the API functionality to the common subset would have meant adding code to disable functionality already implemented in one of the libraries and thus reduce the value of the API. So there were lots of case-by-case discussions whether to restrict functionality or allow differences between implementations. To see all the details, you'd have to read all the messages referring to internationalization on es-discuss.

NorbertLindenberg avatar Oct 20 '16 12:10 NorbertLindenberg

I wonder how common it is for implementers to actually take advantage of this capability. Even if I can see that the market may want to evolve, if it does this evolution in a way which is different in different web browsers, then code taking advantage of ECMA 402 won't be able to depend on it reliably. I don't know of any custom extensions for Intl in V8 that remain besides Intl.v8BreakIterator, but @jungshik could confirm. How about other engines? @bterlson @thetalecrafter @zbraniecki @jswalden

littledan avatar Oct 20 '16 16:10 littledan

We have mozIntl chrome-only API and the policy is to new APIs there and aim to standardize them here and then eventually move from mozIntl to Intl and expose to the content. When we mingle with behavior of an already existing API (NumberFormat.formatToParts etc.) we'll put it behind the flag, until it gets standardized here.

So, to answer your question, we do not alter any behavior, we do not extend Intl and we do not mean to do that.

zbraniecki avatar Oct 20 '16 16:10 zbraniecki

I do not know of any custom extensions to Intl in JSC. There are a few options marked as optional in the spec, that JSC ignores for now including the caseFirst option for Collator.

vanwagonet avatar Oct 20 '16 16:10 vanwagonet

Intl.v8BreakIterator is the only extension v8 currently has.

@thetalecrafter Is there a reason to ignore caseFirst? It'd be easy to support it with ICU (JSC uses ICU, doesn't it?).

jungshik avatar Oct 20 '16 18:10 jungshik

Work started, but hasn't made progress in a long while. Yes, JSC uses ICU. Most of the work for Intl in JSC has come from community contributors (like Sukolsak and me), so progress relies on when we have spare time.

IMHO, the more optional pieces and chance for variation, the more frustrating the APIs probably are to users. It'd be nice to reduce the implementation-specific behavior.

vanwagonet avatar Oct 21 '16 16:10 vanwagonet

see https://github.com/tc39/ecma402/issues/111#issuecomment-255544546

allenwb avatar Oct 22 '16 18:10 allenwb

As far as API surface goes -- objects, functions, properties -- SpiderMonkey implements the spec. But https://github.com/tc39/ecma402/issues/113#issue-183908643 is really concerned about conditioning behavior on subtags in a language tag, where conditioning isn't mandated by the spec. That's very different. I'm of two minds about it.

I don't really know ICU. I don't know what additional subtags ICU might opaquely-but-helpfully let affect behavior. So it's super-convenient for the spec to offer leeway for semantics I don't know about. :-)

But: unstandardized-but-widely-implemented behaviors are unfortunate, because nobody should rely on them, yet eventually it's somewhat probable someone will. Then the other implementations have to discover these quasi-requirements the hard way. :-(

In this particular case, however: Intl is totally best-effort. There's no guarantee whether or how any particular locale is supported, how good a locale's data is, &c. In other words, Intl exact behavior is pretty often unreliable. So users already must deal with behavioral differences like those of an unmandated subtag.

Given that, I probably lean toward not eliminating implementation-dependent behavior, but I don't feel strongly about it. If other people really want to eliminate this leeway, I'm fine with that.

jswalden avatar Oct 26 '16 00:10 jswalden

@jswalden I think there are different dimensions of unsupportedness. There's "this locale is missing" (which is surfaced by resolvedOptions() and other locale APIs), there's "this locale data gives a different answer on one platform than another" (which should just be expected, as data improves over time), and then there's "this API exists or doesn't exist". The latter strikes me as completely different--in realistic implementations, I'd expect variation to be more based on what programmers decided to do in binding to underlying internationalization libraries, rather than what data files are being kept around.

@allenwb Do you have an example of an ECMA 402 user who may want to take advantage of this flexibility? I imagine lots of non-web users would want to know in a strong way what they have access to, API-wise. For example, if Node.js has multiple VMs backing it, it would probably be useful for users to have the same ECMA 402 APIs from all of them.

littledan avatar Oct 26 '16 09:10 littledan

See https://github.com/tc39/ecma402/issues/111#issuecomment-256314321 for why I think we should not have hooks for hosts.

annevk avatar Oct 26 '16 10:10 annevk

@littledan Well, I was one of the people pushing for not allowing for such implementation flexibility in the first edition of ECMA-402. The desire for the flexibility came from people actually implementing OS level I18N support. They were "in the room" and their arguments were convincing enough that the implementation dependencies where included in the first edition. My primary push back now, is that we should not be changing ECMA-402 in that regard without feedback from the same (or similar) platform I18N experts. At the very least, the discussions Norbert referenced in https://github.com/tc39/ecma402/issues/113#issuecomment-255094936 should be reviewed to see whether or not they are still applicable.

Clearly, it is better for ECMA-402 "users" (ie, JS programmers using ECMA-402 API) if it is as maximally comprehensive and 100% interoperable accretes all platforms and implementations. Unless...that requirement cause some implementations to simply not include ECMA-402 support or to pick, in an adhoc manner, the pieces of ECMA-402 they choose to implement with or without changes.

allenwb avatar Oct 26 '16 21:10 allenwb

@allenwb - I think we can safely say that the idea that we will extend Intl via some flexible implementation extensions has not pass the test of time. None of the implementations that we know of currently uses any of the flexibility that we provided.

Wearing my implementers hat, I can say that the degree of flexibility offered is far too low to allow me to rely on it. And I believe at this point Mozilla is a good example of a project vitally interested in extending Intl, because we want to transition toward using it exclusively for our UI. What we ended up doing (mozIntl ) does not require or benefit from that flexibility.

To, @jswalden 's point - while I agree that Intl is an API that requires users to accept variety of outputs and it is a best effort approach, the flexibility that we now carry without any known use case 4 years after the release of the API is introducing additional level of potential confusion.

My vote is to define required behavior, and minimize/remove open-ended extensions.

That being said, I can imagine us, in the future, introducing optional behavior (like we do with language matching), which is a known variability (vs. the unknown variability that @littledan listed). An example would be an API like UnitFormat that would have a list of units that may be supported, and implementations may decide not to support some of them. But i would suggest we do no allow implementations to support additional units that are not in the spec.

zbraniecki avatar Jan 24 '17 22:01 zbraniecki

Probably a time-boxed topic for upcoming meetings to try to achieve consensus on this.

caridy avatar Aug 10 '17 17:08 caridy

We had couple of reasons to allow "Implementation-dependent behavior".

  1. To enable full use of ICU date/time support vs other platforms, e.g. not to limit platforms with better/wider support just so we can fix* the standard
  2. Naive vs. best match algorithms for locale matching for example

We should discuss if and what can be fixed* at this point, given the passage of time, and actual library usage.

  • fixed = frozen

nciric avatar Jan 18 '18 23:01 nciric

@gibson042 Can you take a pass over the spec and enumerate any remaining topics that we should resolve based on this issue?

sffc avatar Jun 05 '20 20:06 sffc

@eemeli has a new proposal, "Stable Formatting", which aims to introduce algorithmic formatting for clients who need it: https://github.com/eemeli/proposal-stable-formatting

sffc avatar Sep 18 '23 23:09 sffc