icu4x
icu4x copied to clipboard
Consider not using `Etc/Unknown` time zone
The Etc/Unknown timezone is not a TZDB timezone. It was invented by CLDR to signal the absence of time zone information, and this is causing confusion when surfaced to users: https://www.google.com/search?q=etc%2Funknown, https://github.com/tc39/proposal-temporal/issues/2510.
Using Etc/Unknown means that every API that handles a time zone input has to special-case this input, which is easy to miss. A more solid API would use Option<TimeZoneBcp47Id> with None instead of Etc/Unknown.
Our TimeZoneMapper already returns Option<TimeZoneBcp47Id>, and our formatter input already uses Option<TimeZoneBcp47Id> so we effectively have two ways of representing unknown time zones: None and Some("unk"). During formatting, None will fall back to the offset (good), and then to {GMT+?} (good), whereas Some("unk") will format as Unknown City Daylight Time or heure: ville inconnue under location formats even when there is an offset that could be formatted.[^1]
My concrete proposal is:
- Remove the
Etc/Unknownkey (and its alias,Factory) from the IANA->BCP47 mapping.- Asking for the BCP47 ID for the
Etc/UnknownIANA ID will fail, becauseEtc/Unknownis not a valid IANA ID - Asking for the BCP47 ID for the
FactoryIANA ID will fail because we say so (and document it)
- Asking for the BCP47 ID for the
- Remove the
unkkey from the BCP47->IANA mapping- It's incorrect to have it in there, because
Etc/Unknownis not a canonical IANA ID
- It's incorrect to have it in there, because
- Make the location formats fail if there's no exemplar city for a time zone, so that the next format can be tried. The location formats currently produce very awkward strings for the
Etc/Unknowntime zone, even when an offset could be formatted.
[^1]: I understand UTS-35 wants it like this, but I don't see why.
@justingrant
We've discussed Etc/Unknown a number of times in the ECMAScript context, but I'm not finding the conclusion of those conversations.
I know that @justingrant got more aliases added to 2024b. I thought Factory was part of that but maybe not.
As far as time zone display goes, there's a difference between "unk" and None, as you noted, which is whether "Unknown City" is displayed instead of the UTC offset time zone. It is likely desirable if there is no offset available: Unknown City is better than GMT+? when starting from a location format. Maybe this could be handled internally by sticking Unknown City as the final fallback for location-based formats (after trying offset).
And, as you also noted, we're doing this because UTS 35 says we should. So this seems like material we maybe should first discuss with them.
It is likely desirable if there is no offset available: Unknown City is better than GMT+? when starting from a location format.
This won't happen under the plans in #5533, as an input with no time zone and no offset won't be possible.
I also think it should say something like Unknown timezone, Unknown City Time is weird because normal people don't think of time zones as being associated with cities, but regions.
Wait: if we move in the direction of being super strict about fields being present or not in the time zone, as you propose in #5533, then it wouldn't be possible to construct a TimeZone::<Location> with an unknown city. That's a problem, because cities get added from time to time and ICU4X should have fallback behavior for them (as prescribed by both UTS-35 and ECMA-402). For example, even if we don't return Etc/Unknown from the parse function, it should be possible to write code such as
// Europe/Kyiv was not added until 2022b; it was previously Europe/Kiev.
// Older installations might not know about the updated name yet.
let location_time_zone = TimeZone::<Location>::try_from_str("Europe/Kyiv")
.unwrap_or(TimeZone::<Location>::unknown());
I agree with @robertbastian that the current state is bad and annoying for both implementers and end-users.
FWIW, the current state in ECMAScript is that when the engine doesn't recognize the OS's time zone, ICU4C returns Etc/Unknown when asked for the system time zone, and at least in Chrome and Safari AFAIK will return Etc/Unknown to the ECMAScript userland caller. But if you try to use Etc/Unknown as input, V8/JSC/SM engines will all throw. Example:
new Intl.DateTimeFormat("en", {timeZone: 'Etc/Unknown'})
// => Uncaught RangeError: invalid time zone in DateTimeFormat(): Etc/Unknown
I believe that this behavior means that new Intl.DateTimeFormat() or Temporal.now.zonedDateTimeISO() can throw on a system whose OS's TZDB is newer than the browser's TZDB. This is not generally an issue for Safari because on Apple platforms JSC will use the OS's TZDB so they'll never get out of sync. But it's a problem for Chrome and Edge for users who install into a newly-added time zone. I'm not sure about FF.
We've discussed Etc/Unknown a number of times in the ECMAScript context, but I'm not finding the conclusion of those conversations.
The conclusion of those previous discussions was that there was no consensus on what to do about it. ☹️ See the 2023-06-01 and 2023-06-29 TG2 meeting notes where we discussed this.
I know that @justingrant got more aliases added to 2024b. I thought Factory was part of that but maybe not.
What you might be thinking of is a change made in CLDR where Factory is now an alias for Etc/Unknown. So if the system's time zone is Factory, then the latest ICU4C reports the canonical ID as Etc/Unknown. AFAIK, systems with Factory as their time zone are very rare because modern OS's now make time zone setting part of OS installation or IT provisioning.
But there was no upstream change in how Factory is handled in IANA. Note that this change wasn't rejected by IANA. I just didn't ask them for any changes.
I could ask the IANA TZ list if they'd be willing to add Etc/Unknown, which currently doesn't exist in IANA, and make Factory a Link to Etc/Unknown. Should I do this? Based on how surprisingly easy it was to get them to update those legacy aliases, I think there's a reasonable chance of success with this change in IANA.
At least then we'd have a universal name across IANA and CLDR for the "I have no idea what this time zone is" case. But it still leaves the question of how engines (or ICU4X) should handle that unknown time zone case. Should they:
- Throw when asked for the system time zone ID?
- Not throw when Etc/Unknown is provided as a time zone ID input, but instead treat Etc/Unknown as UTC for calculations and as "Unknown time zone" for localization?
- Not throw when asked for the system time zone ID, but continue throwing when using that ID as input even though it's valid in IANA? (This is AFAIK the current behavior of engines today.)
I also think it should say something like
Unknown timezone,Unknown City Timeis weird because normal people don't think of time zones as being associated with cities, but regions.
I strongly agree with this. Only people who know about the IANA Time Zone Database will associate cities with time zones. I think "Unknown time zone" is the right English localized name for Etc/Unknown and Factory.
I'm not sure about FF.
If the system time zone identifier is unknown, Firefox first checks if there's a Etc/GMT±XX zone which matches the system time zone offset. If there's no such replacement time zone, Firefox falls back to UTC.
This seems similar to the behaviour we are discussing in #5533, i.e. if there's no time zone but an offset, we can work with the offset. I feel like the key questions are
- Are
Some("Etc/Unknown")andNonethe same time zone? - Are
Some(0)andNonethe same offset?
It sounds like FF's answer to those is yes and yes, ICU4X's current answer is no and no.
@anba if FF fails to find a GMT+XX zone for the system offset and falls back to UTC, which offset does it then use, system or UTC?
@anba if FF fails to find a
GMT+XXzone for the system offset and falls back to UTC, which offset does it then use, system or UTC?
It falls back to UTC offset. (This is for the Intl.DateTimeFormat code path. I wouldn't be surprised if other parts in Firefox use a different fallback mechanism, because this situation when the system time zone isn't recognised doesn't happen too often, so it's not well tested. Bug 1856428 and Bug 1679532 are two actual examples when this fallback path was used.)
#5661 #5656
This zone seems to exist in ICU only because they weren't able to make the API return null due to stability concerns: https://sourceforge.net/p/icu/mailman/icu-design/thread/OF8057666A.AF51DAC4-ON8525783C.00236E03-8525783C.00238179%40lotus.com/#msg27085500
We are already returning an Option, so we should just do None.
WG discussion:
Full: (SpecificNonLocation)
LocationOrGarbage
Option<Offset>
ZoneVariant
LocalTime
AtTime: (GenericLocation, GenericNonLocation)
LocationOrGarbage
LocalTime
Option<Offset>
LocationWithOffset: (GenericLocation)
LocationOrGarbage
Option<Offset>
OffsetOnly: (Offset)
Option<Offset>
LocationWithVariant:
LocationOrGarbage
ZoneVariant
LocationOnly:
LocationOrGarbage
OffsetAtTime:
Option<Offset>
LocalTime
IXDTF cases:
Parse errors:
Z[America/Chicago]-> parse errorZ[Foo/Bar]-> parse error
Z:
Z-> utc, 0 (anything)
OffsetOnly:
-0500-> -5 (OffsetOnly)+?-> (OffsetOnly)
LocationOnly:
[America/Chicago]-> uschi (LocationOnly)[Foo/Bar]-> unk (LocationOnly)
Full:
-0500[America/Chicago]-> uschi, -5 (LocationWithOffset)-0500[Foo/Bar]-> unk, -5 (LocationWithOffset)+?[Foo/Bar]-> unk, unk (LocationWithOffset)
Discussion:
- @sffc I think that OffsetOnly is a unique use case from offset-with-Etc/Unknown. If you have an offset only, you should never be allowed to pass it into a format that needs a location. It might also be the input to some hypothetical future algorithm that infers a location. If the location is specified but unknown (like Europe/Kyiv), it's fine to use a location format (when you update your data, you might start getting a match), and such a time zone should never be passed into a location inference algorithm.
- @robertbastian - This algorithm cannot ever happen. You cannot guess the time zone from a single timestamp and a locale. If you know the location then you already know the location. Europe/Kyiv can already become Etc/Unknown or unk way before hitting ICU4X.
- @sffc Given a TZDB, it would be possible to write an algorithm that picks the best time zone for a given offset and local time. It doesn't need to depend on the locale's region, though the region could be used as a hint.
- @robertbastian - No
- @robertbastian - An offset only is a poor man's timezone, we should not encourage that as an i18n library. We should force the user to give us offset and location, and if they're parsing it from an ixdtf string that's offset only, we can make them explicitly add unknown. An offset is a property of a time zone but not a time zone itself.
More scratch pad:
TimeZoneInfo<OffsetOnly>
UtcOffset
ZonedDateTime<Iso, TimeZoneInfo<LocationOnly>>
ZonedDateTime<Iso, UtcOffset>
(DateTime<Iso>, UtcOffset)
UtcOffset::with_unknown_location() -> TimeZoneInfo<LocationWithOffset>
ZonedDateTime::<UtcOffset>::with_unknown_location() -> ZonedDateTime<TimeZoneInfo<LocationWithOffset>>
pub struct ZonedDateTime<A, Z> {
date: Date<A>,
time: Time,
zone: Z,
}
Conclusions:
TimeZoneInforequires a location or unkUtcOffsetis supported in formatting (only for the LocalizedOffset format) and parsing- Don't advertise offset-only formatting in docs and tutorials except when specifically discussing offset-only formats
ZonedDateTime, if we have such a type, can accept either aTimeZoneInfoor aUtcOffset
LGTM: @sffc @robertbastian @Manishearth
To add to my previous comment: I can envision some data that maps an offset to a "primary metazone": all 24+ of the most common offsets have a primary metazone. This requires TZDB since UTC+1 could be either "Western European Summer Time" or "Central European Standard Time". This operation should only be performed if there is no named time zone. I am not necessarily advocating for this, but I am using it as evidence that "offset only" and "offset with time zone unk" are two different concepts.
all 24+ of the most common offsets have a primary metazone.
I don't think that's correct. UTC+1 could, on the same day (in the northern summer), be Western European Summer Time or West Africa Time; two metazones with different standard offsets that match because one observes DST and the other doesn't.
I posted my idea in https://unicode-org.atlassian.net/browse/CLDR-18037.
This is mostly implemented in #5700. We haven't handled the Z case in parsing.