message-format-wg icon indicating copy to clipboard operation
message-format-wg copied to clipboard

CLDR semantic datetime skeleton spec is nearly ready and MF2 should use it

Open sffc opened this issue 1 year ago • 19 comments
trafficstars

The default registry still lists "field options" as being valid configuration for MF2.

https://github.com/unicode-org/message-format-wg/blob/main/exploration/default-registry-and-mf1-compatibility.md#field-options

As an implementer, I am troubled by this requirement creeping into MF2. It is well established that "field options" are filled with footguns and edge cases. Old-style datetime skeletons not only require that implementations like ICU4X ship larger code and data, they often do not encourage i18n best practices.

I have been working on a specification to solve these issues called "semantic skeleta". We have been discussing this almost every week in the CLDR Design WG that meets weekly on Mondays at 10am (unfortunately the same time slot as the MF WG meeting). Now that the spec draft is nearly complete, I wanted to share it here.

https://docs.google.com/document/d/1dmMk_XODm3DGe84GmMVw7O6a7oIs5yojDXm_25CcMbw/edit#heading=h.d9pp2vm43mob

This is not the first time I've raised this issue, but previously, semantic skeleta were only a design doc. I am posting this issue now to raise awareness about the upcoming technology preview in UTS 35, which I think MF2 should embrace.

sffc avatar Aug 22 '24 20:08 sffc

@sffc Thanks for the update.

Please note that the document you linked is a design document. There is specification text in the registry.md document that also uses these various options.

The existing skeleton support in ICU did not make the cut for LDML45 and isn't supported, AFAIK, in MF2 except via "option bags". I look forward to seeing more of semantic skeletons. The exact way that we integrate such support in MF2 is at a critical juncture. If possible, it would be good to avoid creating sets of options that are deprecated shortly afterwards, but required for conformance. At the same time, we might be just a little ahead of being able to offer the new stuff. I look forward to a discussion here.

aphillips avatar Aug 25 '24 18:08 aphillips

Supporting "options bags of individual field lengths" puts the same requirements on implementations such as ICU4X. In particular, it is highly customizable, meaning that implementations need to ship the DateTimePatternGenerator and all the code and data required for it. Semantic skeleta, on the other hand, are a strict subset designed to represent classical skeletons that "actually make sense", a small enough set that implementations can pre-compute the patterns ahead of time. So, requiring "options bags of individual field lengths" directly harms implementations (currently ICU4X but likely more in the future), and by extension clients of those implementations.

(I haven't brought this to ICU4X-TC for a formal recommendation but personally I think this rises to the level of "a concern that must be resolved during tech preview")

sffc avatar Aug 26 '24 16:08 sffc

@sffc Could you share a couple of examples of what a semantic skeleton value could look like when used as an MF2 option value? It's not immediately obvious to me from the linked doc what they look like.

eemeli avatar Aug 26 '24 16:08 eemeli

The spec defines the schema, not a specific interface, but for MessageFormat 2.0, an interface could look something like

{$someDate :datetime fieldSet=[year, month, day] length=medium}

ICU4X is planning to use all-caps identifiers for the field sets, which MF2 could also choose to adopt (if that happened, we'd probably put them into the semantic skeleton spec)

{$someDate :datetime fieldSet=YMD length=medium}

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

sffc avatar Aug 26 '24 16:08 sffc

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

Is "well-defined" a conformance term (the way we use valid and well-formed in say BCP47)?

I thought at one point there were enumerated names for the well-defined field sets, such as YearMonth etc. with the idea being that only useful ones would be defined.

aphillips avatar Aug 26 '24 16:08 aphillips

An implementation should reject something like fieldSet=[year, hour] length=medium in order to be conformant, if that's what you're asking. That's a good call-out that I'll make sure gets into the semantic skeleta spec.

Yes, the spec lists out the field sets that are well-defined.

sffc avatar Aug 26 '24 16:08 sffc

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I am not saying we should / should not do it.

Just saying that we would probably have to move all the option-bag behavior we have now to a namespace (draft) so that people have something to test with.

Any feedback we get in that space would not be as relevant.

And many might wait for adoption until the next release of ICU.


Also don't support option bags means that MF2 does not align with the current ECMAScript style for DateFormat.

mihnita avatar Aug 28 '24 18:08 mihnita

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I want to emphasize that semantic skeleta are designed to be implemented on top of a library that implements classical skeleta. In ICU4X, there are about 100 lines of code that sits between the semantic skeleton API and the classical skeleton API.

sffc avatar Aug 28 '24 18:08 sffc

Thus far, the option sets for :number and :datetime have been kept as subsets of the options available in the JS Intl formatters. Departing from that approach is something that ought to be discussed also in TC39 TG2. Is there any intent of proposing semantic skeletons for adoption in Intl.DateTimeFormat?

eemeli avatar Aug 29 '24 11:08 eemeli

Semantic skeleta are designed to be implemented on top of classical skeleta, which includes ECMAScript-style options bags. In other words, semantic skeleta are a subset of Intl.DateTimeFormat with a facelift.

sffc avatar Aug 29 '24 22:08 sffc

I am tagging this as "Future" because it will not meet the cutoff for LDML46. It will still be considered prior to exiting Tech Preview, which is expected in the 2024 calendar year.

aphillips avatar Sep 09 '24 18:09 aphillips

The semantic skeleton spec technical preview was just approved for CLDR 46. https://github.com/unicode-org/cldr/pull/4031

Please note the section defining how to map from a semantic skeleton to a classical skeleton.

sffc avatar Sep 11 '24 16:09 sffc

@aphillips aphillips added LDML 47 and removed LDML46.1 labels last week

I assume this means that we still plan to fix this in LDML 47.

sffc avatar Nov 25 '24 17:11 sffc

I assume this means that we still plan to fix this in LDML 47.

This means that the WG would like it to be addressed in 47 and will consider it for 47. We still need a proposal etc.

aphillips avatar Nov 25 '24 17:11 aphillips

I assume this means that we still plan to fix this in LDML 47.

This means that the WG would like it to be addressed in 47 and will consider it for 47. We still need a proposal etc.

In TC39, proposals do not advance until all feedback has been fixed or punted; proposal champions are responsible for making sure action items get addressed. Is this not how MFWG operates? Are you waiting on someone to make a proposal here?

sffc avatar Nov 25 '24 17:11 sffc

A proposal would be very helpful. This can take the form of a design or it can just be a PR against the registry. We are working on regularizing the registry maintenance process. The proposed process is here and you can see us starting to use aspects of this in 46.1.

I don't see anyone else currently championing skeletons. If you feel strongly, don't hold back! Telling the working group to do it might not result in action on its own.

aphillips avatar Nov 25 '24 22:11 aphillips

It's still extremely unclear to me what the status is of datetime formatting options. I am confused because in registry.md, the language "The following options and their values are required to be available on the function" is present for :number, :integer, :math, and others, but not on :date or :datetime. This suggests to me that these options are either a draft or are recommended ("proposed"?) but not required.

What I want to see:

  • LDML 47 requires support for dateStyle and timeStyle options.
  • LDML does not require support for the field options.
  • ICU 77 namespaces them with icu: or something.

sffc avatar Dec 28 '24 06:12 sffc

Given deadlines, this won't make v47. The functions in question won't be stable then, so marking for 48

aphillips avatar Feb 14 '25 15:02 aphillips

@sffc I did reserve time in a previous call for this, but you were not available that day. Due to the timing of v47, this will not appear in the 2025-02-17 agenda. However, the :datetime function will not be stable in v47, so hopefully you agree that discussing in the 2025-02-24 call (or later) for inclusion in v48 is appropriate.

aphillips avatar Feb 14 '25 17:02 aphillips

This direction was chosen in #1083.

eemeli avatar Jul 23 '25 16:07 eemeli