ECMAScript Proposal: Optional ICU Compatibility for Intl API
ECMA-402 Proposal: Enhanced ICU Integration for Intl API
Champion(s)
[Names of champions]
Stage
Stage 0
Motivation
The Intl API currently provides a standardized way for JavaScript applications to handle internationalization. However, it doesn't fully expose the capabilities of the International Components for Unicode (ICU) library, which is already included in most JavaScript environments. This proposal aims to bridge that gap, providing developers with more powerful and flexible internationalization tools while leveraging existing resources.
Key Points
-
Browsers already include full ICU objects: Major browsers like Chrome, Firefox, and Safari already ship with complete ICU implementations. This proposal seeks to expose these existing capabilities more directly.
-
Node.js supports configurable ICU data: Node.js allows specifying ICU data files using the
NODE_ICU_DATAenvironment variable. This demonstrates the flexibility and existing support for comprehensive ICU functionality in server-side environments. -
Underutilized resources: Despite the presence of full ICU capabilities in most JavaScript environments, developers cannot fully leverage these resources through the current Intl API.
-
Potential for performance improvements: Direct access to ICU functions could reduce the overhead of the current abstraction layer in the Intl API.
Prior Art
This proposal builds directly on the International Components for Unicode (ICU) library, which is already integrated into:
- All major browsers (Chrome, Firefox, Safari, Edge)
- Node.js (with configurable data via
NODE_ICU_DATA) - Java's java.text and java.util packages
- .NET's System.Globalization namespace
- C++'s ICU4C library
Description
We propose introducing an enhanced mode for the Intl API that provides more direct access to the existing ICU capabilities:
const formatter = new Intl.DateTimeFormat('en-US', {
enhancedICU: true,
pattern: 'EEEE, MMMM d, y' // ICU date format pattern
});
console.log(formatter.format(new Date('2023-05-17')));
// Output: Wednesday, May 17, 2023
const collator = new Intl.Collator('de-DE', {
enhancedICU: true,
strength: 'quaternary' // ICU collation strength
});
console.log(collator.compare('ä', 'a')); // More precise comparison
Expensive to Implement in Userland
Implementing ICU-level functionality in pure JavaScript would be prohibitively expensive:
- Data size: Full ICU data is already included in browsers and configurable in Node.js. Reimplementing this in JS would unnecessarily duplicate large amounts of data.
- Algorithm complexity: Many ICU algorithms are highly optimized and would be inefficient if reimplemented in JavaScript.
- Maintenance burden: Keeping up with Unicode standards and CLDR updates is already handled by ICU maintainers.
Broad Appeal
- npm statistics: Popular i18n libraries like moment.js (11M weekly downloads) and date-fns (27M weekly downloads) demonstrate the high demand for advanced date formatting capabilities.
- Framework adoption: React-intl (2.5M weekly downloads) and Angular's i18n module showcase the need for powerful i18n tools in major frameworks.
- High-profile use cases:
- Google Calendar requires advanced date/time formatting and calculations.
- Booking.com needs precise collation for multilingual hotel searches.
- Twitter's language detection and sorting for multilingual content.
Detailed Design
- Introduce an
enhancedICUoption to all Intl constructors. - When
enhancedICUis true, allow the use of ICU patterns and options directly. - Provide access to additional ICU features like "compound formats" for DateTimeFormat, "alternate handling" for Collator, etc.
- Expose an
Intl.getICUVersion()method to check the underlying ICU version.
Payload Mitigation
This proposal does not increase implementation size because:
- Browsers already include full ICU implementations.
- Node.js allows configuring ICU data separately.
- The proposal exposes existing functionality rather than adding new data or algorithms.
- Any additional code would be minimal, primarily consisting of new API surface to expose existing ICU capabilities.
Compatibility
This proposal is fully backwards-compatible. All existing Intl functionality remains unchanged when enhancedICU is not used.
Implementation
The implementation would primarily involve creating new bindings between the existing ICU implementations in JavaScript engines and the Intl API surface. Most of the heavy lifting is already done by the included ICU libraries.
Summary
By providing enhanced ICU integration, this proposal significantly improves ECMA-402's internationalization capabilities, aligns closely with existing implementations, and meets the criteria for addition to the specification. It leverages resources already present in JavaScript environments, provides powerful tools that are expensive to implement in userland, has broad appeal, and does not increase payload size.
could potentially solve #891
Expanding the Intl API to provide everything that's made available by ICU4C/ICU4J would also require any alternative to those libraries (such as ICU4X) to provide the same features.
It's also not clear from the proposal what the extent of the ask here is; what new options and such would be included?
Do I gather right that the set of options enabled by enhancedICU would be defined separately for each ICU version, and potentially include breaking changes?
Speaking as a proponent of expansion, I have to say that some icu api/implementation has roots going back over 30 years (not a typo). I don't think it's appropriate to just expose everything.
As to the version, there are pros and big cons in "leaking" this level of detail. I'd be more in favor of a way to get at the cldr version-or-equivalent something such as data = "https://cldr.unicode.org 46.0.0" giving a provenance for the data set (not the implementation). But there are major caveats (customization that everyone does, more importantly potential for misuse)
ECMA-402 is designed from the ground up to express what Web and client-side developers need as opposed to what ICU4C happens to provide. There is a lot of functionality in ICU4C that is simply not relevant to the Web platform.
Additionally, ICU4C APIs carry cruft and do not always represent modern i18n best practices.
I agree with @srl295 that if we want to work toward a lower-level abstraction than ECMA-402, I'd look into CLDR. One of my first issues on this repo is #210 for exactly this type of functionality.
@ptu14, if you can identify specific gaps that you need in the Web platform, those can be individual proposals. For example, you mention date skeleton strings and "high demand for advanced date formatting capabilities." We could discuss specific date formatting proposals on their own merits.
If you would like to discuss this, please join [email protected] and we can schedule this for an upcoming call. If you are able to travel, many of us will also be here for the Unicode Technology Workshop on October 22-23 and an ECMA-402 Face-to-Face on October 24.