icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22763 MF2: Handle time zones correctly in :datetime

Open catamorphism opened this issue 1 year ago • 19 comments

Previously, the time zone components of date/time literals were ignored. In order to store the time zone properly for re-formatting, this change also changes the representation of Formattable date values to use a GregorianCalendar rather than a UDate.

This is a public API change and a design doc can be found in the "ICU 76 API proposal status" document (and was also emailed to the icu-design list on April 29.)

In the TC discussion on May 9, there was some uncertainty about whether to use GregorianCalendar or Calendar as the return type of message2::Formattable::getDate(). I'm leaving it as GregorianCalendar for now but can change it if necessary.

Checklist
  • [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22763
  • [x] Required: The PR title must be prefixed with a JIRA Issue number.
  • [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [x] Tests included, if applicable
  • [x] API docs and/or User Guide docs changed or added, if applicable

catamorphism avatar May 16 '24 20:05 catamorphism

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2_function_registry.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/unicode/messageformat2_formattable.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

cc @echeran @mihnita

catamorphism avatar May 16 '24 22:05 catamorphism

@FrankYFTang setting you as the main reviewer because Mihai is ooo for a while soon. Note that the API change proposal is still under discussion.

markusicu avatar May 23 '24 16:05 markusicu

Here is my view

  1. Currently, the Message Format proposal only accept string in ISO8601, without the support of timeZone Extension and Calendar extension as stated in RFC 9557 https://datatracker.ietf.org/doc/html/rfc9557

  2. therefore, the only additional information it need to and should be allowed to get back is for the case of the string has UTC offset, and ISO8601 only allow it to be in minutes precision. " 5.3.4.1 Time shift between local time scale and UTC The time shift between the local time scale and UTC can be expressed in hours and minutes, or hours only "

  3. therefore, the requirement is to allow the caller to get back this information only, but not open up a can of worm that more than that. and we should also allow the interface to distinguish the case that the string has Z (UTC) or no Z.

  4. Currently, in ICU, we can get the zone offset from Calendar.get(UCAL_ZONE_OFFSET) and the DST offset from Calendar.get(UCAL_DST_OFFSET) and the UTC offset is the sum of that in millseconds
    So I think we should define a new type for input and return, which has

  5. a UDate

  6. int32_t rawOffsetGMT - represent the zone offset + dst offset in millisecond and this timezone (2 above) can be used ot create a TimeZone by

icu::SimpleTimeZone::SimpleTimeZone ( int32_t rawOffsetGMT, const UnicodeString & ID )

FrankYFTang avatar Jun 13 '24 20:06 FrankYFTang

Maybe something like

struct {
  UDate date;
  int32_t rawOffsetGMT;
} DateAndGMTOffset

?

FrankYFTang avatar Jun 13 '24 21:06 FrankYFTang

Note that MF2 will not stay with only supporting vanilla 8601 strings. Real time zone support is important. An offset is insufficient.

aphillips avatar Jun 13 '24 21:06 aphillips

Note that MF2 will not stay with only supporting vanilla 8601 strings. Real time zone support is important. An offset is insufficient.

that is what I asked this morning and the answer I got back is only vanilla 8601 strings. if MF2 is going to support RFC 9557 https://datatracker.ietf.org/doc/html/rfc9557 then we will need to consider not just timezone extension in RFC9557 but also calendar extension in RFC 9557. Therefore, it will be a super bad idea to use GregorianCalendar since the calendar could be in a different calendar system.

We should consider the following cases

  1. the string is with a Z
  2. the string is with a UTC offset (both 1 and 2 are just vanilla 8601 strings)
  3. the string is with timeZone extension as in RFC 9557
  4. the string is with calendar extnesion as in RFC 9557
  5. any combination of 1,2,3,4

For 4, we need a string to indicate the real time zone name but not more than that For 5, we need a string to indicate the calendar name but not more than tat

how about

struct {
  UDate date;
  int32_t rawOffsetGMT;
  const char* zoneName; // "Z" if UTC, nullptr if timeZone offset, other if from RFC9557
  const char* calendarName; // nullptr if not specified
} DateInfo

FrankYFTang avatar Jun 13 '24 22:06 FrankYFTang

that is what I asked this morning and the answer I got back is only vanilla 8601 strings.

I apologize for giving the wrong answer -- I was answering based on the existing spec. @aphillips is more of an authority than me about how the spec will evolve in the future.

catamorphism avatar Jun 13 '24 23:06 catamorphism

(personal response, chair hat off)

The offset is irrelevant. The only thing an offset is useful for is adjusting the date value (the incremental time value--Temporal calls this an Instant). Once the incremental time value is computed, you can safely forget the offset. (In most situations, I would argue that the value should be normalized to UTC and the implementation should forget the offset) The time zone, however, is used in expression of the date value later. The time zone can also be changed or removed ("floating" the value).

The use of non-Gregorian calendars in date/time serializations is more problematic: there's not a lot of implementation experience, particularly with non-binary multi-era calendars. It's usually better (or at least more common) to use a proleptic Gregorian calendar (i.e. 8601) on the wire (especially for incremental time values!) and convert using calendar rules for processing/display. Perhaps this will evolve and mature further? My experience here is limited.

There also needs to be a concept of a floating time value (one not tied to specific instants on the timeline). This could use null for the time zone. Perhaps:

struct {
  UDate date; // what is a `UDate` anyway? Is it millis since the Unix epoch?
  const char* zoneName; // IANA tz name; "UTC" if UTC; nullptr if value is floating
  const char* calendarName; // nullptr if not specified; proleptic Gregorian (8601) calendar is default
} DateInfo

In any case, this was a significant gap in the MF2 tech preview. It's important for us to add time zone management to MF2 before LDML46 and the ICU4C implementation should be designed/adapted with that in mind.

aphillips avatar Jun 14 '24 15:06 aphillips

Thanks for the feedback, @aphillips and @FrankYFTang . I'll take a look at these options and update the design doc accordingly.

My first thought, though, is why not something like:

struct {
  UDate date;
  SimpleTimeZone tz; // Captures offset and time zone name
  const char* calendarName; // nullptr if not specified
} DateInfo

Why not re-use the existing SimpleTimeZone class?

catamorphism avatar Jun 17 '24 18:06 catamorphism

@FrankYFTang I've made changes to match the suggestions; can you take another look? I've also updated the design proposal in the "ICU 76 API proposal status" doc.

catamorphism avatar Jun 18 '24 21:06 catamorphism

needs squash

srl295 avatar Jun 28 '24 21:06 srl295

@FrankYFTang Are you OK with the current state of the design doc for this PR? The JIRA ticket shows as "Accepted", but I've made changes to the design doc since your last comment.

catamorphism avatar Jul 19 '24 00:07 catamorphism

In 197caef, I added some comments explaining the meaning of the fields in the DateInfo struct. These can be checked against the code in messageformat2_formattable.cpp that formats an existing DateInfo struct, and that parses a date literal into a DateInfo -- the two locations where the DateFormat::adoptTimeZone() method is called.

catamorphism avatar Aug 08 '24 22:08 catamorphism

I also removed the calendarName field in 9800253. On further thought, it doesn't seem necessary; a DateInfo represents a parsed date literal, and date literal strings don't include a calendar name. In the future, calendar will be an option on the :datetime formatter. Then it will be handled the same way as all other options, and the return value from the formatter will represent the previously passed-in calendar by default. Therefore, I don't think we need it in the DateInfo struct, which is meant to represent the parsed version of a date literal string.

catamorphism avatar Aug 08 '24 22:08 catamorphism

This needs to be rebased because of #3050 moving where the test data is, but I'll do that at the very end in order to avoid losing review comments.

catamorphism avatar Aug 08 '24 22:08 catamorphism

By the way, I'll be on vacation from now until September 9, so I won't see further comments until I return.

catamorphism avatar Aug 13 '24 23:08 catamorphism

@FrankYFTang ping?

catamorphism avatar Oct 07 '24 18:10 catamorphism

@FrankYFTang Ping?

catamorphism avatar Feb 25 '25 22:02 catamorphism

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2_formattable.cpp is different
  • icu4c/source/i18n/messageformat2_function_registry_internal.h is different
  • icu4c/source/i18n/messageformat2_function_registry.cpp is different
  • icu4c/source/i18n/tznames.cpp is different
  • icu4c/source/i18n/ucln_in.h is different
  • icu4c/source/i18n/unicode/messageformat2_formattable.h is different
  • icu4c/source/test/intltest/messageformat2test_utils.h is different
  • icu4c/source/test/intltest/messageformat2test.cpp is different
  • icu4c/source/test/testdata/message2/more-functions.json is no longer changed in the branch
  • icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/more-functions.json is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang I've fixed the remaining issues and rebased.

catamorphism avatar Mar 05 '25 03:03 catamorphism