ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Specify canonicalization algorithms for Intl enumeration

Open ptomato opened this issue 3 years ago • 2 comments

Once Intl.supportedValuesOf becomes part of the specification, apply the changes proposed in https://github.com/tc39/proposal-intl-enumeration/pull/49

ptomato avatar Nov 02 '22 23:11 ptomato

@ben-allen to sync with @ptomato and coordinate on a PR.

sffc avatar May 02 '23 22:05 sffc

Note that the time zones (unlike the other things that Intl.supportedValuesOf enumerates) also live in 262, because non-UTC IANA time zones (and hence the need to canonicalize them) can exist in non-402 implementations of ECMAScript.

@gibson042 and I have been working on an editorial PR for 262 (https://github.com/tc39/ecma262/pull/3035) to specify how time zone canonicalization works there. (With editorial PRs for 402 and Temporal to follow if the 262 PR is accepted.) The summary of the PR is:

  • Add an implementation-defined AO AvailableTimeZoneIdentifiers which returns a list of {Identifier, CanonicalIdentifier} records.
  • Add a non-implementation-defined AO GetAvailableTimeZoneIdentifier(id) which returns the record where Identifier ASCII-case-insensitively matches id, or ~empty~.
  • Replace CanonicalizeTimeZoneIdentifier with calls to GetAvailableTimeZoneIdentifier(id).CanonicalIdentifier.
  • Replace IsAvailableTimeZoneIdentifier with calls to GetAvailableTimeZoneIdentifier(id) is not ~empty~
  • Rename DefaultTimeZone to SystemTimeZoneIdentifier to match naming of other related AOs. (And because there's no "default" time zone in Temporal.)

I don't know if this approach is relevant to other things that Intl.supportedValuesOf enumerates, but Richard and I would be happy to coordinate with @ben-allen and @ptomato if similar idioms could be used for those other enumerations too.

BTW, here's a few reasons for the set of AOs above:

  • Simplify and consolidate AOs and reduce the number of implementation-defined AOs required (instead of one for enumeration, another for canonicalization, another for case normalization, etc.)
  • Simplify 402, Temporal, and (if accepted) proposal-canonical-tz spec text.
  • Expose both canonical and non-canonical IDs, either as a full list or as an individual pair, to support a wider range of use-cases without having to change existing behavior or add new AOs. For example, had these AOs been in place earlier, then Temporal, Intl.supportedValuesOf, and proposal-canonical-tz wouldn't need to change or add any AOs to work with time zone IDs.
  • Narrow the scope of 262 text that needs to be overridden in 402, because overrides introduce complexity for readers and implementers. Similarly, reduce the scope of Temporal text that overrides 262 and/or 402.
  • Encourage implementers to think about caching and/or hard-coding the list of IDs and using their indexes for canonicalization instead of fetching them one at a time and storing strings in internal slots. Doing this could make ZonedDateTime, TimeZone, and DateTimeFormat types more space-efficient. Like this pseudo-C++ code:
struct TimeZoneIdRecord {
  const unsigned short idIndex; // could also be a 10-bit field
  const unsigned short canonicalIdIndex; // could also be a 10-bit field
};

// Everything below populated via automated build step using IANA and/or CLDR data

const unsigned short TIMEZONE_ID_COUNT = 579;

const char* sortedTimeZoneIds[TIMEZONE_ID_COUNT] = {
 "Africa/Abidjan",
 "Africa/Accra",
 // . . . 
};

// for case-normalized comparisons
const char* lowerCaseTimeZoneIds[TIMEZONE_ID_COUNT] = {
 "africa/abidjan",
 "africa/accra",
 // . . . 
};

const TimeZoneIdRecord sortedTimeZoneIdMap[TIMEZONE_ID_COUNT] = {
 { 0, 0 },  // example of a canonical ID
 { 1, 1 },
 // . . .
 { 203, 16 }, // example of a non-canonical ID
 // . . .
};

justingrant avatar May 03 '23 01:05 justingrant