icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

DateTimeFormatter still lacks power user APIs

Open sffc opened this issue 2 years ago • 9 comments

In https://github.com/unicode-org/icu4x/issues/380 and the corresponding design doc, I suggested that DateTimeFormatter have multiple layers:

At a very basic level, I think we should split DateTimeFormat as follows:

  • DateTimeFormat: top-level, all-inclusive kitchen sink

    • Contains one or more DateTimePatternInterpolator and one or more TimeZoneFormat
    • Contains constructors for each of the types of DateTimeFormat we want to build
  • DateTimePatternInterpolator: formats a specific pattern

    • Contains a DataPayload for a single pattern being used (zero-alloc when possible)
    • Does not support time zones directly; needs a TimeZoneFormat
  • TimeZoneFormat: formats a time zone

    • (already done)

Internally, we will likely want additional classes to do things like skeleton resolution. This does not need to be public API. An argument could be made that DateTimePatternInterpolator should be internal as well.

It seems this got dropped and not resolved in that older issue.

sffc avatar Apr 19 '23 05:04 sffc

Relatedly: I'm warming up to the idea that the DateTimePattern constructor should resolve the pattern parts into a Vec or SmallVec. It will make the formatting code much more straightforward, and most of the time these patterns will be fewer than ~10 PatternItems, so they can be easily stored on the stack in a SmallVec.

sffc avatar Apr 21 '23 21:04 sffc

Discuss with:

  • @sffc
  • @Manishearth
  • @zbraniecki
  • @eggrobin

Optional:

  • @echeran

sffc avatar May 11 '23 18:05 sffc

The new API should enable the locale-specific symbols to be pre-loaded but provide a pattern at runtime. This is not dissimilar to the approach we're taking with FixedDecimalFormatter, where we push locale-invariant settings from construction time into format time.

sffc avatar May 13 '23 06:05 sffc

Tentative conclusion:

  1. DateTimePatternInterpolator does not own a pattern, but it has Option<DataPayload> fields for all the datetime symbols
  2. It has &mut self loading functions to fill in the slots
  3. It takes either a &runtime::Pattern or a impl Iterator<Item = PatternItem> in the terminal function, returning a FormattedDateTime type return value
  4. Wait until icu_pattern is landed before implementing this.

LGTM: @sffc @zbraniecki (@robertbastian) (@Manishearth)

sffc avatar Jun 28 '23 09:06 sffc

Additional notes on the design:

  • NeoDateTimeFormatter owns a DateTimePatternInterpolator and 1 or more runtime::Patterns. In the format function, it selects the correct pattern for the datetime input and returns a NeoFormattedDateTime that borrows the pattern, the input, and the interpolator data.
  • Alternative naming: DateTimeSymbolBag (for the thing that owns symbols) and DateTimePatternInterpolator (for the thing that borrows symbols and a pattern)
  • We considered adding a type that owns a single pattern and an interpolator, but its use cache is niche and it can just be served by NeoDateTimeFormatter.
  • An interesting proposition for DateTimePatternInterpolator is the following API:
struct NeoDateTimeFormatter {
    interpolator: DateTimePatternInterpolator,
    patterns: ShortVec<runtime::Pattern>
}

impl NeoDateTimeFormatter {
    pub fn format<'l>(&'l self, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> {
        // First: select the correct pattern. OK to special-case when there is only a single pattern.
        self.interpolator.format_with_pattern(&pattern, input)
    }

impl DateTimePatternInterpolator {
    // One of these per DataKey:
    pub fn load_month_symbols(&mut self, provider: &P, length: FieldLength) -> Result<&mut Self> {
        // ...
    }

    // also include_month_symbols for the compiled_data version

    pub fn load_for_pattern<'a>(&'a mut self, provider: &P, pattern: &'a runtime::Pattern) -> Result<DateTimePatternInterpolatorForPattern<'a>> {
        // load the data, then pack the references into the return value
    }
    
    // compiled_data: include_for_pattern
    
    pub fn with_pattern<'a>(&'a self, pattern: &'a runtime::Pattern) -> Result<DateTimePatternInterpolatorWithPattern<'a>> {
        // returns an error if the data isn't loaded
    }

    pub fn format_with_pattern<'l>(&'l self, pattern: &'l runtime::Pattern, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> { ... }
}

struct DateTimePatternInterpolatorWithPattern<'a> {
    pattern: &'a runtime::Pattern,
    interpolator: DateTimePatternInterpolatorBorrowed<'a>,
}

impl<'a> DateTimePatternInterpolatorWithPattern<'a> {
    pub fn format<'l>(&'l self, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> {
        // pack the additional references into NeoFormattedDateTime
    }
}

The call site would look like this:

// Option 1: Use the same interpolator for multiple patterns (always in the same locale and calendar system)
let mut interpolator = DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into());
interpolator.load_with_pattern(provider, "M/d/y".parse().unwrap())
    .unwrap()
    .format("2023-10-25".parse().unwrap())
    // this won't fail if using an ICU4X impl of DateTimeInput:
    .write_to_string();
interpolator.load_with_pattern(provider, "MMM d, y".parse().unwrap())
    .unwrap()
    .format("2023-10-25".parse().unwrap())
    .write_to_string();

// Option 2: All in a single line (less cached data)
DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into())
    .load_with_pattern(provider, "MMM d, y".parse().unwrap())
    .unwrap()
    .format("2023-10-25".parse().unwrap())
    .write_to_string();

// Option 3: Without the intermediate type:
let mut interpolator = DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into());
let pattern = "MMM d, y".parse().unwrap();
interpolator.load_with_pattern(provider, &pattern).unwrap();
// Here we don't have a place to keep the invariant that we have loaded the correct symbols, so this needs to be fallible (with a core::fmt::Error):
interpolator.format_with_pattern(&pattern, "2023-10-25".parse().unwrap());

Any additional feedback before I implement this?

sffc avatar Oct 26 '23 01:10 sffc

  • @robertbastian - new should move the locale, or else it should borrow the locale with a lifetime
  • @sffc - This is not consistent with TimeZoneFormatter but I'm fine diverging (and perhaps making a 2.0 to fix TimeZoneFormatter)
  • @robertbastian - TZF takes a reference and clones, that's the worst solution out of these 3

sffc avatar Oct 26 '23 18:10 sffc

Naming bikeshed for DateTimePatternInterpolator:

  1. DateTimeSymbols
  2. DateTimeSymbol[s]Bag
  3. DateTimeData
  4. DateTimeDataBag
  5. DateTimeSymbol[s]Cache
  6. DateTimeSymbol[s]Loader
  7. DateTimeSymbol[s]Holder
  8. DateTimePatternFormatter
  9. DateTimeSymbol[s]Store
  10. DateTimeFormatInfo
  11. DateTimePatternSymbols
  • @sffc - Not a fan of DateTimeSymbols due to ICU4C baggage with that name. Not a fan of Bag because we use that word for options bags. Not a fan of Cache because we advertise caches as something that lives outside of ICU4X. Store is okay but we use that word as a trait name in other places.
  • @robertbastian - Not a fan of Data because it's a filler word
  • @Manishearth - It should be named for what it's for, not what it's used internally. People should find it if they want to format patterns directly. The key thing is patterns, not sumbols.

Will temporarily go with:

  • DateTimePatternSymbols which owns symbols
  • DateTimePatternFormatter which borrows the symbols and the pattern

LGTM: (@sffc) (@Manishearth)

sffc avatar Oct 26 '23 18:10 sffc

My PR #4204 uses the name TypedDateTimePatternInterpolator as proposed in the OP. It doesn't do everything proposed yet so the name is still open.

Some other ideas:

  • DateTimeNames
  • DateTimeSymbolData
  • DateTimeFormatterData
  • BorrowedDateTimeNamesWithPattern

sffc avatar Nov 21 '23 00:11 sffc

Also we can add helper functions on DateTimeNames to get specific symbols like the AM/PM symbol.

sffc avatar Dec 20 '23 00:12 sffc