icu4x
icu4x copied to clipboard
DateTimeFormatter still lacks power user APIs
In https://github.com/unicode-org/icu4x/issues/380 and the corresponding design doc, I suggested that DateTimeFormatter have multiple layers:
At a very basic level, I think we should split DateTimeFormat as follows:
DateTimeFormat: top-level, all-inclusive kitchen sink
- Contains one or more DateTimePatternInterpolator and one or more TimeZoneFormat
- Contains constructors for each of the types of DateTimeFormat we want to build
DateTimePatternInterpolator: formats a specific pattern
- Contains a DataPayload for a single pattern being used (zero-alloc when possible)
- Does not support time zones directly; needs a TimeZoneFormat
TimeZoneFormat: formats a time zone
- (already done)
Internally, we will likely want additional classes to do things like skeleton resolution. This does not need to be public API. An argument could be made that DateTimePatternInterpolator should be internal as well.
It seems this got dropped and not resolved in that older issue.
Relatedly: I'm warming up to the idea that the DateTimePattern constructor should resolve the pattern parts into a Vec or SmallVec. It will make the formatting code much more straightforward, and most of the time these patterns will be fewer than ~10 PatternItems, so they can be easily stored on the stack in a SmallVec.
Discuss with:
- @sffc
- @Manishearth
- @zbraniecki
- @eggrobin
Optional:
- @echeran
The new API should enable the locale-specific symbols to be pre-loaded but provide a pattern at runtime. This is not dissimilar to the approach we're taking with FixedDecimalFormatter, where we push locale-invariant settings from construction time into format time.
Tentative conclusion:
DateTimePatternInterpolatordoes not own a pattern, but it hasOption<DataPayload>fields for all the datetime symbols- It has
&mut selfloading functions to fill in the slots - It takes either a
&runtime::Patternor aimpl Iterator<Item = PatternItem>in the terminal function, returning a FormattedDateTime type return value - Wait until icu_pattern is landed before implementing this.
LGTM: @sffc @zbraniecki (@robertbastian) (@Manishearth)
Additional notes on the design:
NeoDateTimeFormatterowns aDateTimePatternInterpolatorand 1 or moreruntime::Patterns. In the format function, it selects the correct pattern for the datetime input and returns aNeoFormattedDateTimethat borrows the pattern, the input, and the interpolator data.- Alternative naming:
DateTimeSymbolBag(for the thing that owns symbols) andDateTimePatternInterpolator(for the thing that borrows symbols and a pattern) - We considered adding a type that owns a single pattern and an interpolator, but its use cache is niche and it can just be served by
NeoDateTimeFormatter. - An interesting proposition for
DateTimePatternInterpolatoris the following API:
struct NeoDateTimeFormatter {
interpolator: DateTimePatternInterpolator,
patterns: ShortVec<runtime::Pattern>
}
impl NeoDateTimeFormatter {
pub fn format<'l>(&'l self, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> {
// First: select the correct pattern. OK to special-case when there is only a single pattern.
self.interpolator.format_with_pattern(&pattern, input)
}
impl DateTimePatternInterpolator {
// One of these per DataKey:
pub fn load_month_symbols(&mut self, provider: &P, length: FieldLength) -> Result<&mut Self> {
// ...
}
// also include_month_symbols for the compiled_data version
pub fn load_for_pattern<'a>(&'a mut self, provider: &P, pattern: &'a runtime::Pattern) -> Result<DateTimePatternInterpolatorForPattern<'a>> {
// load the data, then pack the references into the return value
}
// compiled_data: include_for_pattern
pub fn with_pattern<'a>(&'a self, pattern: &'a runtime::Pattern) -> Result<DateTimePatternInterpolatorWithPattern<'a>> {
// returns an error if the data isn't loaded
}
pub fn format_with_pattern<'l>(&'l self, pattern: &'l runtime::Pattern, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> { ... }
}
struct DateTimePatternInterpolatorWithPattern<'a> {
pattern: &'a runtime::Pattern,
interpolator: DateTimePatternInterpolatorBorrowed<'a>,
}
impl<'a> DateTimePatternInterpolatorWithPattern<'a> {
pub fn format<'l>(&'l self, input: impl DateTimeInput) -> NeoFormattedDateTime<'l> {
// pack the additional references into NeoFormattedDateTime
}
}
The call site would look like this:
// Option 1: Use the same interpolator for multiple patterns (always in the same locale and calendar system)
let mut interpolator = DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into());
interpolator.load_with_pattern(provider, "M/d/y".parse().unwrap())
.unwrap()
.format("2023-10-25".parse().unwrap())
// this won't fail if using an ICU4X impl of DateTimeInput:
.write_to_string();
interpolator.load_with_pattern(provider, "MMM d, y".parse().unwrap())
.unwrap()
.format("2023-10-25".parse().unwrap())
.write_to_string();
// Option 2: All in a single line (less cached data)
DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into())
.load_with_pattern(provider, "MMM d, y".parse().unwrap())
.unwrap()
.format("2023-10-25".parse().unwrap())
.write_to_string();
// Option 3: Without the intermediate type:
let mut interpolator = DateTimePatternInterpolator::<Gregorian>::new(&locale!("en").into());
let pattern = "MMM d, y".parse().unwrap();
interpolator.load_with_pattern(provider, &pattern).unwrap();
// Here we don't have a place to keep the invariant that we have loaded the correct symbols, so this needs to be fallible (with a core::fmt::Error):
interpolator.format_with_pattern(&pattern, "2023-10-25".parse().unwrap());
Any additional feedback before I implement this?
- @robertbastian -
newshould move the locale, or else it should borrow the locale with a lifetime - @sffc - This is not consistent with TimeZoneFormatter but I'm fine diverging (and perhaps making a 2.0 to fix TimeZoneFormatter)
- @robertbastian - TZF takes a reference and clones, that's the worst solution out of these 3
Naming bikeshed for DateTimePatternInterpolator:
DateTimeSymbolsDateTimeSymbol[s]BagDateTimeDataDateTimeDataBagDateTimeSymbol[s]CacheDateTimeSymbol[s]LoaderDateTimeSymbol[s]HolderDateTimePatternFormatterDateTimeSymbol[s]StoreDateTimeFormatInfoDateTimePatternSymbols
- @sffc - Not a fan of
DateTimeSymbolsdue to ICU4C baggage with that name. Not a fan ofBagbecause we use that word for options bags. Not a fan ofCachebecause we advertise caches as something that lives outside of ICU4X.Storeis okay but we use that word as a trait name in other places. - @robertbastian - Not a fan of
Databecause it's a filler word - @Manishearth - It should be named for what it's for, not what it's used internally. People should find it if they want to format patterns directly. The key thing is patterns, not sumbols.
Will temporarily go with:
DateTimePatternSymbolswhich owns symbolsDateTimePatternFormatterwhich borrows the symbols and the pattern
LGTM: (@sffc) (@Manishearth)
My PR #4204 uses the name TypedDateTimePatternInterpolator as proposed in the OP. It doesn't do everything proposed yet so the name is still open.
Some other ideas:
DateTimeNamesDateTimeSymbolDataDateTimeFormatterDataBorrowedDateTimeNamesWithPattern
Also we can add helper functions on DateTimeNames to get specific symbols like the AM/PM symbol.