icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

[Date]TimeFormatter constructors don't take hour12 override

Open hsivonen opened this issue 2 years ago • 19 comments

ECMA-402 has hour12 and hourCycle overrides for the locale's hour cycle. AFAICT, ICU4X [Date]TimeFormatter constructors don't have API surface for this override. For ECMA-402 compat, ICU4X should have API surface for this.

hsivonen avatar Aug 17 '23 09:08 hsivonen

Maybe hourCycle is available via -u-hc-? Still leaves hour12 (which happens to be what appears in various preference UIs).

hsivonen avatar Aug 17 '23 10:08 hsivonen

hourCycle is in icu_datetime::options::components::Bag.

hsivonen avatar Aug 17 '23 10:08 hsivonen

-u-hc- works.

I don't think we have hour12 except that you can mostly get that via -u-hc- but you need to choose between h11/h12 or h23/h24 instead of letting the locale choose.

sffc avatar Aug 17 '23 21:08 sffc

Letting the locale affect hour12 expansion to hourCycle seems required for ECMA-402 compliance.

AFAICT, by the time the [Date]TimeFormatter constructor has done the provider-level resolution of the locale data, there doesn't appear to be a way for the application code to modify the instance to perform hour12 resolution in application code.

hsivonen avatar Aug 18 '23 07:08 hsivonen

macOS Ventura and Gnome expose a boolean system pref for this topic, so hour12 might be relevant to support to be able to honor system preferences. I don't have test code that I'd know would behave according to the system boolean pref semantics. However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work. AFAICT, hour12=false for en-US would resolve to h24. Do users actually want that result instead of h23? (As a user of the en-US locale for untranslated strings with 24-hour clock enabled, I don't want h24. I guess I'll need to make a TODO item of observing the system clock at midnight.)

hsivonen avatar Aug 18 '23 07:08 hsivonen

Looking at https://github.com/unicode-org/cldr-json/blob/80a94b0f6c3a34d6e2dc0dca8639a54babc87f94/cldr-json/cldr-core/supplemental/timeData.json#L4 , I observe:

  1. The preferred cycle for each locale is either h or H, i.e. h12 or h23.
  2. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Given that the preferred cycle for each locale is either h12 or h23, it's unclear to me what problem h11 and h24 or hour12 expanding to h11 or h24 solve.

hsivonen avatar Aug 18 '23 08:08 hsivonen

However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.

The current spec is incorrect. There's a PR to fix this → https://github.com/tc39/ecma402/pull/758.

anba avatar Aug 18 '23 08:08 anba

However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.

The current spec is incorrect. There's a PR to fix this → tc39/ecma402#758.

Thanks. Given that change and the non-existence of k-default and K-default locales, I guess one option would be to have ICU4X ECMA-402 wrapper code hard-code hour12=true to h12 and hour12=false to h23, and close this API request as WONTFIX.

hsivonen avatar Aug 18 '23 08:08 hsivonen

2. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Japan has K → https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

b and B are day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period

anba avatar Aug 18 '23 08:08 anba

  1. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Japan has K → https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

Oops. I missed that. So: K is allowed in one locale but isn't the default anywhere and k is specced for completeness and isn't in use anywhere?

b and B are day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period

Thanks.

hsivonen avatar Aug 18 '23 08:08 hsivonen

Oops. I missed that. So: K is allowed in one locale but isn't the default anywhere and k is specced for completeness and isn't in use anywhere?

Yes. K isn't the default hour-cycle for Japan per <timeData>/<hours>, but when selecting {hour: "numeric", hour12: true}, the resolved pattern will contain K, see here. That also means it's not possible to replace hour12=true with hourCycle=h12.


For example new Intl.DateTimeFormat("en", {hour:"numeric"}) can be customised as follows:

Options Skeleton Resolved Pattern Final Pattern
{hour:"numeric"} j h a h a
{hour:"numeric", hour12: true} h h a h a
{hour:"numeric", hour12: false} H HH HH
{hour:"numeric", hourCycle: "h11"} h h a K a
{hour:"numeric", hourCycle: "h12"} h h a h a
{hour:"numeric", hourCycle: "h23"} H HH HH
{hour:"numeric", hourCycle: "h24"} H HH kk

And new Intl.DateTimeFormat("ja", {hour:"numeric"}) can be customised as follows:

Options Skeleton Resolved Pattern Final Pattern
{hour:"numeric"} j H時 H時
{hour:"numeric", hour12: true} h aK時 aK時
{hour:"numeric", hour12: false} H H時 H時
{hour:"numeric", hourCycle: "h11"} h aK時 aK時
{hour:"numeric", hourCycle: "h12"} h aK時 ah時
{hour:"numeric", hourCycle: "h23"} H H時 H時
{hour:"numeric", hourCycle: "h24"} H H時 k時

In an input skeleton, h is automatically matched to either h or K in the resolved pattern. Similarly, H is matched to either H or k.

Spec:

  • https://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems
  • https://unicode.org/reports/tr35/tr35-dates.html#dfst-hour

The allowed strings in <timeData>/<hours> are mostly relevant for the C skeleton, so it's not yet relevant ECMA-402 date-time formatting. (Spec: https://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems)

They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the allowed values from <timeData>/<hours>.

ICU4C doesn't have a public API to retrieve the allowed values, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.

anba avatar Aug 18 '23 10:08 anba

Yes. K isn't the default hour-cycle for Japan per <timeData>/<hours>, but when selecting {hour: "numeric", hour12: true}, the resolved pattern will contain K, see here. That also means it's not possible to replace hour12=true with hourCycle=h12.

Thanks. So ICU4X is currently missing a way to handle hour12 in a data-driven way.

Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean hour12 and the boolean "region is JP" to hourCycle, right?

That is:

if hour12 {
  if region_of_locale_is_JP {
    h11
  } else {
    h12
  }
} else {
  h23
}

They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the allowed values from <timeData>/<hours>.

The rendered spec that you linked to has HourCyclesOfLocale, but the README claims "Hour Cycle DROPPED by Champion". @FrankYFTang , is the current intention to include or exclude HourCyclesOfLocale?

ICU4C doesn't have a public API to retrieve the allowed values, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.

I don't see any non-test callers for that method. What am I missing?

hsivonen avatar Aug 21 '23 13:08 hsivonen

Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean hour12 and the boolean "region is JP" to hourCycle, right?

It needs to be hard-coded on the language, not the region, because the date-time patterns are in https://github.com/unicode-org/cldr/blob/main/common/main/ja.xml.

I don't see any non-test callers for that method. What am I missing?

Only the parts relevant for the "Unified Intl API" work (bug 1686965) have been committed in bug 1693576. The rest will be put up for review when the open issues in the proposal have been resolved.

anba avatar Aug 21 '23 13:08 anba

There is some interesting code to handle some of this resolution logic in components/datetime/src/pattern/hour_cycle.rs

I also observe that we already have the preferred hour cycle (h11h12 or h23h24) in ICU4X data: https://github.com/unicode-org/icu4x/blob/main/provider/datagen/tests/data/json/datetime/timelengths%401/en.json\

So I think everything is here to support hour12 if we were to add it to an options bag somewhere.

sffc avatar Aug 21 '23 18:08 sffc

Just to make this clearer for those playing along here. Japan is the only country that allows support for the use of K value for times.

<hours preferred="H" allowed="H K h" regions="JP"/>

Ref: https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

The options are h, H, K, k and are defined as such:

image https://unicode.org/reports/tr35/tr35-dates.html#dfst-hour

Currently ECMA spec incorrectly assumes a coupling of h-k and H-K. That is the following is baked in as implicit assumption:

  • twelve hour time presented with hours 0-11/00-11 (K) will present twenty four hour time as 00-23 (H)
  • twelve hour time presented with hours 1-12/01-12 (h) will present twenty four hour time as 01-24 (k)

The ECMA standard definitely needs to change as the current implementation is a bug. The universal (as far as I've been able to determine) rejection of k, and the only occasional adoption of K as an option renders the above assumption absolutely incorrect, and realistically should have been identified prior to publication. https://github.com/tc39/ecma402/pull/758 has identified a solution that expands how 12-hour and 24-hour time is presented at a regional level. Work is ongoing to get this to a point of acceptance. This is slated for 2023-09 TC39 meeting.

jufemaiz avatar Sep 14 '23 01:09 jufemaiz

We still need to figure out a way to support ECMA-402's hour12 in ICU4X.

It has always been really clunky how there are two ways of specifying almost the same thing.

Thought: should I bring to CLDR a proposal to add variants to HourCycle such as

  • auto12 or a12 = pick the best 12-hour variant for the locale
  • auto24 or a24 = pick the best 24-hour variant for the locale

It should look at the whole locale when determining the resolved hour cycle. Examples:

Locale Identifier Resolved Hour Cycle Comment
en-US H12
de-DE H23
ja-JP H23
en-US-u-hc-h11 H11
de-DE-u-hc-h11 H11
ja-JP-u-hc-h11 H11
en-US-u-hc-h12 H12
de-DE-u-hc-h12 H12
ja-JP-u-hc-h12 H12
en-US-u-hc-h23 H23
de-DE-u-hc-h23 H23
ja-JP-u-hc-h23 H23
en-US-u-hc-a12 H12
de-DE-u-hc-a12 H12
ja-JP-u-hc-a12 H11 <== this is the interesting one
en-US-u-hc-a23 H23
de-DE-u-hc-a23 H23
ja-JP-u-hc-a23 H23

Or maybe this should just go as an option on the time field set, more like ECMA-402 does it. It would be an enum with 3 variants (or an Option of a 2-variant enum): Auto, Prefer12, and Prefer24.

Locale Identifier Hour Cycle Option Resolved Hour Cycle Comment
en-US Auto H12
de-DE Auto H23
ja-JP Auto H23
es-MX-u-hc-h11 Auto H11
en-US Prefer12 H12
de-DE Prefer12 H12
ja-JP Prefer12 H11 <== interesting case
es-MX-u-hc-h11 Prefer12 H11 <== interesting case
en-US Prefer24 H23
de-DE Prefer24 H23
ja-JP Prefer24 H23
es-MX-u-hc-h11 Prefer24 H23

sffc avatar Jul 10 '25 06:07 sffc

CLDR ticket: https://unicode-org.atlassian.net/browse/CLDR-18894

sffc avatar Aug 13 '25 21:08 sffc

A potential proposal:

Value of -u-hc Name Description Comments
h11 H11 12-hour cycle, 0-11
h12 H12 12-hour cycle, 1-12
h23 H23 24-hour cycle, 0-23
h24 H24 24-hour cycle, 1-24 Might remove
c12 Clock12 12-hour cycle
c24 Clock24 24-hour cycle Not required if H24 is removed

sffc avatar Sep 09 '25 22:09 sffc

I'm pulling this up to 2.2 since it is being added to CLDR in the next release.

sffc avatar Nov 10 '25 19:11 sffc