[Date]TimeFormatter constructors don't take hour12 override
ECMA-402 has hour12 and hourCycle overrides for the locale's hour cycle. AFAICT, ICU4X [Date]TimeFormatter constructors don't have API surface for this override. For ECMA-402 compat, ICU4X should have API surface for this.
Maybe hourCycle is available via -u-hc-? Still leaves hour12 (which happens to be what appears in various preference UIs).
hourCycle is in icu_datetime::options::components::Bag.
-u-hc- works.
I don't think we have hour12 except that you can mostly get that via -u-hc- but you need to choose between h11/h12 or h23/h24 instead of letting the locale choose.
Letting the locale affect hour12 expansion to hourCycle seems required for ECMA-402 compliance.
AFAICT, by the time the [Date]TimeFormatter constructor has done the provider-level resolution of the locale data, there doesn't appear to be a way for the application code to modify the instance to perform hour12 resolution in application code.
macOS Ventura and Gnome expose a boolean system pref for this topic, so hour12 might be relevant to support to be able to honor system preferences. I don't have test code that I'd know would behave according to the system boolean pref semantics. However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work. AFAICT, hour12=false for en-US would resolve to h24. Do users actually want that result instead of h23? (As a user of the en-US locale for untranslated strings with 24-hour clock enabled, I don't want h24. I guess I'll need to make a TODO item of observing the system clock at midnight.)
Looking at https://github.com/unicode-org/cldr-json/blob/80a94b0f6c3a34d6e2dc0dca8639a54babc87f94/cldr-json/cldr-core/supplemental/timeData.json#L4 , I observe:
- The preferred cycle for each locale is either
horH, i.e.h12orh23. kandKdo not appear in allowed cycles.hbandhBdo, but I don't find a spec explaining what they mean.
Given that the preferred cycle for each locale is either h12 or h23, it's unclear to me what problem h11 and h24 or hour12 expanding to h11 or h24 solve.
However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.
The current spec is incorrect. There's a PR to fix this → https://github.com/tc39/ecma402/pull/758.
However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.
The current spec is incorrect. There's a PR to fix this → tc39/ecma402#758.
Thanks. Given that change and the non-existence of k-default and K-default locales, I guess one option would be to have ICU4X ECMA-402 wrapper code hard-code hour12=true to h12 and hour12=false to h23, and close this API request as WONTFIX.
2.
kandKdo not appear in allowed cycles.hbandhBdo, but I don't find a spec explaining what they mean.
Japan has K → https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888
b and B are day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period
kandKdo not appear in allowed cycles.hbandhBdo, but I don't find a spec explaining what they mean.Japan has
K→ https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888
Oops. I missed that. So: K is allowed in one locale but isn't the default anywhere and k is specced for completeness and isn't in use anywhere?
bandBare day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period
Thanks.
Oops. I missed that. So:
Kis allowed in one locale but isn't the default anywhere andkis specced for completeness and isn't in use anywhere?
Yes. K isn't the default hour-cycle for Japan per <timeData>/<hours>, but when selecting {hour: "numeric", hour12: true}, the resolved pattern will contain K, see here. That also means it's not possible to replace hour12=true with hourCycle=h12.
For example new Intl.DateTimeFormat("en", {hour:"numeric"}) can be customised as follows:
| Options | Skeleton | Resolved Pattern | Final Pattern |
|---|---|---|---|
| {hour:"numeric"} | j | h a | h a |
| {hour:"numeric", hour12: true} | h | h a | h a |
| {hour:"numeric", hour12: false} | H | HH | HH |
| {hour:"numeric", hourCycle: "h11"} | h | h a | K a |
| {hour:"numeric", hourCycle: "h12"} | h | h a | h a |
| {hour:"numeric", hourCycle: "h23"} | H | HH | HH |
| {hour:"numeric", hourCycle: "h24"} | H | HH | kk |
And new Intl.DateTimeFormat("ja", {hour:"numeric"}) can be customised as follows:
| Options | Skeleton | Resolved Pattern | Final Pattern |
|---|---|---|---|
| {hour:"numeric"} | j | H時 | H時 |
| {hour:"numeric", hour12: true} | h | aK時 | aK時 |
| {hour:"numeric", hour12: false} | H | H時 | H時 |
| {hour:"numeric", hourCycle: "h11"} | h | aK時 | aK時 |
| {hour:"numeric", hourCycle: "h12"} | h | aK時 | ah時 |
| {hour:"numeric", hourCycle: "h23"} | H | H時 | H時 |
| {hour:"numeric", hourCycle: "h24"} | H | H時 | k時 |
In an input skeleton, h is automatically matched to either h or K in the resolved pattern. Similarly, H is matched to either H or k.
Spec:
- https://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems
- https://unicode.org/reports/tr35/tr35-dates.html#dfst-hour
The allowed strings in <timeData>/<hours> are mostly relevant for the C skeleton, so it's not yet relevant ECMA-402 date-time formatting. (Spec: https://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems)
They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the allowed values from <timeData>/<hours>.
ICU4C doesn't have a public API to retrieve the allowed values, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.
Yes.
Kisn't the default hour-cycle for Japan per<timeData>/<hours>, but when selecting{hour: "numeric", hour12: true}, the resolved pattern will containK, see here. That also means it's not possible to replacehour12=truewithhourCycle=h12.
Thanks. So ICU4X is currently missing a way to handle hour12 in a data-driven way.
Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean hour12 and the boolean "region is JP" to hourCycle, right?
That is:
if hour12 {
if region_of_locale_is_JP {
h11
} else {
h12
}
} else {
h23
}
They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the
allowedvalues from<timeData>/<hours>.
The rendered spec that you linked to has HourCyclesOfLocale, but the README claims "Hour Cycle DROPPED by Champion". @FrankYFTang , is the current intention to include or exclude HourCyclesOfLocale?
ICU4C doesn't have a public API to retrieve the
allowedvalues, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.
I don't see any non-test callers for that method. What am I missing?
Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean
hour12and the boolean "region is JP" tohourCycle, right?
It needs to be hard-coded on the language, not the region, because the date-time patterns are in https://github.com/unicode-org/cldr/blob/main/common/main/ja.xml.
I don't see any non-test callers for that method. What am I missing?
Only the parts relevant for the "Unified Intl API" work (bug 1686965) have been committed in bug 1693576. The rest will be put up for review when the open issues in the proposal have been resolved.
There is some interesting code to handle some of this resolution logic in components/datetime/src/pattern/hour_cycle.rs
I also observe that we already have the preferred hour cycle (h11h12 or h23h24) in ICU4X data: https://github.com/unicode-org/icu4x/blob/main/provider/datagen/tests/data/json/datetime/timelengths%401/en.json\
So I think everything is here to support hour12 if we were to add it to an options bag somewhere.
Just to make this clearer for those playing along here. Japan is the only country that allows support for the use of K value for times.
<hours preferred="H" allowed="H K h" regions="JP"/>
Ref: https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888
The options are h, H, K, k and are defined as such:
Currently ECMA spec incorrectly assumes a coupling of h-k and H-K. That is the following is baked in as implicit assumption:
- twelve hour time presented with hours
0-11/00-11(K) will present twenty four hour time as00-23(H) - twelve hour time presented with hours
1-12/01-12(h) will present twenty four hour time as01-24(k)
The ECMA standard definitely needs to change as the current implementation is a bug. The universal (as far as I've been able to determine) rejection of k, and the only occasional adoption of K as an option renders the above assumption absolutely incorrect, and realistically should have been identified prior to publication. https://github.com/tc39/ecma402/pull/758 has identified a solution that expands how 12-hour and 24-hour time is presented at a regional level. Work is ongoing to get this to a point of acceptance. This is slated for 2023-09 TC39 meeting.
We still need to figure out a way to support ECMA-402's hour12 in ICU4X.
It has always been really clunky how there are two ways of specifying almost the same thing.
Thought: should I bring to CLDR a proposal to add variants to HourCycle such as
auto12ora12= pick the best 12-hour variant for the localeauto24ora24= pick the best 24-hour variant for the locale
It should look at the whole locale when determining the resolved hour cycle. Examples:
| Locale Identifier | Resolved Hour Cycle | Comment |
|---|---|---|
| en-US | H12 | |
| de-DE | H23 | |
| ja-JP | H23 | |
| en-US-u-hc-h11 | H11 | |
| de-DE-u-hc-h11 | H11 | |
| ja-JP-u-hc-h11 | H11 | |
| en-US-u-hc-h12 | H12 | |
| de-DE-u-hc-h12 | H12 | |
| ja-JP-u-hc-h12 | H12 | |
| en-US-u-hc-h23 | H23 | |
| de-DE-u-hc-h23 | H23 | |
| ja-JP-u-hc-h23 | H23 | |
| en-US-u-hc-a12 | H12 | |
| de-DE-u-hc-a12 | H12 | |
| ja-JP-u-hc-a12 | H11 | <== this is the interesting one |
| en-US-u-hc-a23 | H23 | |
| de-DE-u-hc-a23 | H23 | |
| ja-JP-u-hc-a23 | H23 |
Or maybe this should just go as an option on the time field set, more like ECMA-402 does it. It would be an enum with 3 variants (or an Option of a 2-variant enum): Auto, Prefer12, and Prefer24.
| Locale Identifier | Hour Cycle Option | Resolved Hour Cycle | Comment |
|---|---|---|---|
| en-US | Auto | H12 | |
| de-DE | Auto | H23 | |
| ja-JP | Auto | H23 | |
| es-MX-u-hc-h11 | Auto | H11 | |
| en-US | Prefer12 | H12 | |
| de-DE | Prefer12 | H12 | |
| ja-JP | Prefer12 | H11 | <== interesting case |
| es-MX-u-hc-h11 | Prefer12 | H11 | <== interesting case |
| en-US | Prefer24 | H23 | |
| de-DE | Prefer24 | H23 | |
| ja-JP | Prefer24 | H23 | |
| es-MX-u-hc-h11 | Prefer24 | H23 |
CLDR ticket: https://unicode-org.atlassian.net/browse/CLDR-18894
A potential proposal:
Value of -u-hc |
Name | Description | Comments |
|---|---|---|---|
| h11 | H11 | 12-hour cycle, 0-11 | |
| h12 | H12 | 12-hour cycle, 1-12 | |
| h23 | H23 | 24-hour cycle, 0-23 | |
| h24 | H24 | 24-hour cycle, 1-24 | Might remove |
| c12 | Clock12 | 12-hour cycle | |
| c24 | Clock24 | 24-hour cycle | Not required if H24 is removed |
I'm pulling this up to 2.2 since it is being added to CLDR in the next release.