icu4x
icu4x copied to clipboard
Neo date formatter: Options<R> vs R::Options and same for .format
Currently in #5248 I implemented
/// Options bag for datetime formatting.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
#[non_exhaustive]
pub struct NeoOptions<R: DateTimeMarkers> {
/// The desired length of the formatted string,
/// if required for the chosen field set.
///
/// See [`NeoSkeletonLength`].
pub length: R::LengthOption,
}
where R::LengthOption is set according to the field set. It is usually NeoSkeletonLength but it is a zero-sized type for field sets that don't need the length.
I intend to add at least two more options:
R::HourCycleOptionfor runtime hour cycle choicesR::EraDisplayOptionfor runtime era display choices
Another approach would be to instead create separate options bags for each combinations of possible options. For example:
#[non_exhaustive]
pub struct DateWithYearOptions {
pub length: Length,
pub era_display: EraDisplay,
}
#[non_exhaustive]
pub struct DateWithoutYearOptions {
pub length: Length,
}
#[non_exhaustive]
pub struct TimeOptions {
pub length: Length,
pub hour_cycle: HourCycle,
}
// note: same as DateWithoutYearOptions; could be combined
#[non_exhaustive]
pub struct ZoneWithRuntimeLengthOptions {
pub length: Length,
}
#[non_exhaustive]
pub struct ZoneWithConstLengthOptions {
}
#[non_exhaustive]
pub struct DateTimeWithYearOptions {
pub length: Length,
pub era_display: EraDisplay,
pub hour_cycle: HourCycle,
}
and then R::Options would choose the correct struct.
Pros and cons:
NeoOptions<R>makes it possible/easier to add more options in the future in a non-breaking wayNeoOptions<R>is arguably more consistent with the neo formatter design which has aimed at lots of markers/traits but very few concrete typesR::Optionsis probably more clear in documentation because you are constructing a real struct with specific fields, rather than a weird struct whose fields depend onR
I also wanted to raise the possibility of applying a similar treatment to the format<I>(&self, input: &I) function. Currently we have a trait that looks like this:
pub trait AllInputMarkers<R: DateTimeMarkers>:
NeoGetField<<R::D as DateInputMarkers>::YearInput>
+ NeoGetField<<R::D as DateInputMarkers>::MonthInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfMonthInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfWeekInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfYearInput>
+ NeoGetField<<R::D as DateInputMarkers>::AnyCalendarKindInput>
+ NeoGetField<<R::T as TimeMarkers>::HourInput>
+ NeoGetField<<R::T as TimeMarkers>::MinuteInput>
+ NeoGetField<<R::T as TimeMarkers>::SecondInput>
+ NeoGetField<<R::T as TimeMarkers>::NanoSecondInput>
+ NeoGetField<<R::Z as ZoneMarkers>::TimeZoneInput>
where
R::D: DateInputMarkers,
R::T: TimeMarkers,
R::Z: ZoneMarkers,
{
}
which is blanked-implemented on all T satisfying its bounds. We could instead have
#[non_exhaustive]
pub struct Input<R> {
pub year: R::YearInput,
pub month: R::MonthInput,
// ...
}
Pros and cons:
pub trait AllInputMarkers<R>allows users to pass foreign types directly into the format function so long as they implement the traitpub struct Input<R>makes the format function less generic (instead of being generic in bothRandI, now it is generic only inR), probably resulting in smaller code sizepub struct Input<R>allows clients to more easily drive the formatter even without usingicu_calendar::Dateor one of the other official date types
Thoughts?
@Manishearth @robertbastian @zbraniecki
Discussed with Shane yesterday. In general I'm in favor of having the intermediate type with the user calling .into() or whatever, or having a single trait.
Both of these add some indirection, but they're more easily documented over a pile of traits. IMO functions like format() should spend ink documenting how they are used, without having to go too deep into "how to make the traits work". Having an intermediate type or trait gets us over the finish line with a dedicated place to describe and untangle the trait mess.
Between having an intermediate type or trait, I find that having a type tends to be less messy, and probably better for monomorphization.
In addition to hour cycle and era display, I also need an option for fractional second digits, unless that gets modeled as part of the field set.
Would like comments from @robertbastian and @zbraniecki
I'm in favour of R::Options for usability.
Discussion on constructors, also with implications on NeoOptions:
use icu::datetime::fieldsets::YMD;
use icu::datetime::DateComponents;
icu::datetime::Formatter::<YMD>::try_new(locale, options)
icu::datetime::Formatter::try_new_with_components(locale, DateComponents::YearMonth, options)
// Zibi prefers:
icu::datetime::Formatter::try_new(locale, YMD, options)
// Shane: It can also be written as:
icu::datetime::Formatter::<YMD>::try_new(locale, YMD, options)
// Robert: This would give gnarly errors
icu::datetime::Formatter::<YMDT>::try_new(locale, YMD, options)
// Robert: This actually works if the type can be inferred, i.e. it's declared in a field or argument
let formatter: Formatter<YMD> =
icu::datetime::Formatter::try_new(locale, option)
// Shane: can we keep the old signature around?
icu::datetime::Formatter::<YMD>::try_new_bikeshed(locale, options)
// Shane: If we go with NeoOptions<R>, we could make it so that you write something like
icu::datetime::Formatter::try_new(locale, YMD.with_length(Length::Long) /* NeoOptions<YMD> */)
- @robertbastian You see these generics a lot. You can hide it, but you can't hide it when you pass it to a function, for example. And you see the type a lot in error messages.
- @zbraniecki For such a high-profile type, I don't think we necesarily want to force people to learn about how the generics work before they can even use it.
- @sffc We could add another constructor signature?
- @nekevss Could you put the field set into the options?
- @sffc Yes, if we go with
NeoOptions<R> - @robertbastian What do we use in docs?
- @sffc I slightly prefer using the turbofish in docs but if people think
YMD.with_lengthor similar is better, we can do it - @robertbastian I like fully specifying a type, but not a turbofish on function calls. I really don't like
YMD.with_lengthbecause it goes out of the way to hide that the type has a generic. - @Manishearth I don't think users need to understand they are different types.
- @sffc I think that Zibi's position about learnability is important. I think the type parameter is important for people to know about eventually, but not in their initial use. People are used to creating a DateTimeFormatter with fields specified as options: this is how Intl works, it is how ICU4C works and ICU4J and most other formatting libraries. So it makes sense to me that there should be a way to specify the field set as a parameter in the constructor.
Constructor signature proposal:
- The options type is one of the following:
Options<FSet>with afield_setfieldFSetOptions, one type per field set marker
- All fieldsets implement
Defaultand are inhabited ZSTs (have exactly 1 possible value) - fieldset types gain convenience methods like
YMD.with_length(..)etc - We document three constructor call sites:
Formatter::<YMD>::try_new(locale, Options {...})Formatter::try_new(locale, YMD.with_length(..))Formatter::try_new(locale, Options { ..., fieldset: YMD })(we don't need to display this in any docs, but people can do this if they want)
- In our docs we can choose to show different techniques if needed. We pick the first two techniques to primarily use in docs: the type param by default, and the helpers where possible.
- Auto-generated docs tests on marker types will use the first (type param) method.
LGTM: @Manishearth @sffc
New proposal:
Formatter::<YMD>::try_new(locale, Options {...})is the primary constructor- If options design has one-type-per-fieldset (
YMDOptions, or just haveYMDbe the options type), or a single options type generic over fieldsets (Options<YMD>)- We add
YMD.with_length(..)convenience options builders which will produce an appropriately-constrained options type - We can also document
Formatter::try_new(locale, YMD.with_length(..)). - We primarily still use the turbofish ctor, but use the
with_length()convenience ones in some docs where possible.
- We add
- Else, if we have five options types
DateOptionsetc, they gain a generic and afield_setfield, so you can sayDateOptions {...field_set: YMD}if you choose, but also it can be defaulted as usual.- Downside: now we have generics on these options types and really don't need them usually. However, the choice of fieldset is something that can be considered an option so it's not too weird. (Check if @robertbastian is ok with this constraint)
LGTM: @Manishearth @sffc
Need input from @robertbastian @zbraniecki
The main thing that we discussed in the meeting after people left was the "else" block there, which proposes a way to decouple the ctor discussion from the options discussion, however it constrains the options discussion by giving DateOptions (etc) a generic in the "we have five options types" model
I want to explore the field set being the options type.
We would get call sites like this:
icu::datetime::Formatter::try_new(locale, YMD::with_length(Length::Medium))
The function signature is
impl Formatter<FSet> {
pub fn try_new(Preferences, FSet)
}
And the field set would be defined like this (most likely expanded from a macro)
#[non_exhaustive]
pub struct YMD {
pub length: Length,
pub era_display: EraDisplay,
}
For runtime components, it would be
#[non_exhaustive]
pub struct DateSkeleton {
pub field_set: DateFieldSet,
pub length: Length,
pub era_display: EraDisplay,
}
Assuming it works, I can agree to this.
This would produce good documentation, so I'm onboard.
We haven't reached consensus on the input types.
Currently, we have:
impl<C: CldrCalendar, R: DateTimeMarkers> TypedNeoFormatter<C, R>
where
R::D: DateInputMarkers,
R::T: TimeMarkers,
R::Z: ZoneMarkers,
{
pub fn format<I>(&self, input: &I) -> FormattedNeoDateTime
where
I: ?Sized + IsInCalendar<C> + AllInputMarkers<R>,
{
let input = ExtractedInput::extract_from_neo_input::<R::D, R::T, R::Z, I>(input);
FormattedNeoDateTime {
pattern: self.selection.select(&input),
input,
names: self.names.as_borrowed(),
}
}
}
pub trait AllInputMarkers<R: DateTimeMarkers>:
NeoGetField<<R::D as DateInputMarkers>::YearInput>
+ NeoGetField<<R::D as DateInputMarkers>::MonthInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfMonthInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfWeekInput>
+ NeoGetField<<R::D as DateInputMarkers>::DayOfYearInput>
+ NeoGetField<<R::D as DateInputMarkers>::AnyCalendarKindInput>
+ NeoGetField<<R::T as TimeMarkers>::HourInput>
+ NeoGetField<<R::T as TimeMarkers>::MinuteInput>
+ NeoGetField<<R::T as TimeMarkers>::SecondInput>
+ NeoGetField<<R::T as TimeMarkers>::NanoSecondInput>
+ NeoGetField<<R::Z as ZoneMarkers>::TimeZoneOffsetInput>
+ NeoGetField<<R::Z as ZoneMarkers>::TimeZoneIdInput>
+ NeoGetField<<R::Z as ZoneMarkers>::TimeZoneMetazoneInput>
+ NeoGetField<<R::Z as ZoneMarkers>::TimeZoneVariantInput>
where
R::D: DateInputMarkers,
R::T: TimeMarkers,
R::Z: ZoneMarkers,
{
}
Problems with this design:
- We are using traits because we thought that third-party libraries could implement them. However, as observed previously, most third-party libraries can't actually implement these traits, since they don't have useful calendar information, for example.
- More traits means more monomorphization.
- Traits means you can't
.into(): you must explicitly name the type being formatted.
I would instead like to explore the following design:
pub struct NeoInput<R: DateTimeMarkers>
where
R::D: DateInputMarkers,
R::T: TimeMarkers,
R::Z: ZoneMarkers,
{
pub year: <R::D as DateInputMarkers>::YearInput,
pub month: <R::D as DateInputMarkers>::MonthInput,
// ...
}
This is reminiscent of NeoOptions.
Pros and cons:
- Pro: Monomorphization depends only on
R(the field set), not on the type being formatted - Pro:
.into()works - Pro: Third-party types can implement functions to convert themselves into a
NeoInput, which is a more straightforward task than figuring out all theGetInputimpls - Pro: More consistent with the other terminals such as
FixedDecimalFormatterandPluralRulesthat process types that have a bunch of conversion impls available - Con: Same reasons my original NeoOptions design didn't taste right to @robertbastian
- Pro and Con: Fields are eagerly evaluated
I put "fields are eagerly evaluated" as a pro and con. An advantage is that it has a more clean separation of responsibilities: field resolution happens above the ICU4X formatter membrane. A disadvantage is that it means we can't in the future refactor the code to lazy-evaluate the inputs.
I want to emphasize that the actual fields being read are determined by the field set R, not by the pattern, and that we are already reading fields eagerly. I'm not convinced that we will ever want lazy extraction:
- If fields are read in
.format: we need to iterate over the pattern in order to know which fields we need to read. But, we already iterate over the pattern inimpl TryWriteable, so it seems wrong to do it again in.format. It is likely slower, because there's a lot more branching involved: iterating over the pattern requires reading all the fields and pattern items with big enum matches. But maybe it could work. - If fields are read in
impl TryWriteablevia a type parameter onFormattedNeo: Bad because the monomorphization permeates much deeper, temporaries can't be passed into the format function, and it is a breaking change. - If fields are read in
impl TryWriteablevia a trait object: Bad because trait objects are bad for code size, temporaries, and breaking change.
I guess it is possible that we could have lazy extraction in .format (but probably not TryWriteable). So, that's the one thing my proposal would preclude.
@robertbastian @Manishearth
I discussed this a bit with @robertbastian and @Manishearth today. I didn't take notes, but here were some of my takeaways:
Creating struct Input<FSet> makes the bounds on Formatter a bit simpler, but it just moves the "trait soup" somewhere else. Types that want to convert themselves into Input<FSet> still need to have all the input bounds on their conversion functions.
I noted above that third-party datetime types can't really in practice implement GetInput, but I didn't give concrete examples. YearInfo and MonthInfo have a lot of fields and internal invariants. For example:
#[non_exhaustive]
pub struct YearInfo {
pub extended_year: i32,
pub kind: YearKind,
}
#[non_exhaustive]
pub enum YearKind {
Era(EraYear),
Cyclic(CyclicYear),
}
#[non_exhaustive]
pub struct EraYear {
pub formatting_era: Era,
pub standard_era: Era,
pub era_year: i32,
}
#[non_exhaustive]
pub struct CyclicYear {
pub year: NonZero<u8>,
pub related_iso: i32,
}
There are similar cross-field invariants between Time Zone and Metazone, between MonthInfo and DayOfMonth, between a future WeekOfYear and DayOfYear, etc.
A convenience function could be added to construct a YearInfo or MonthInfo from an ISO or Gregorian year or month number, and I consider it a trait invariant that the data returned from different GetInput calls be self-consistent. However, there's nothing enforcing it. @robertbastian pointed out that a custom type might work in English Gregorian but then break in some other locale and calendar system that wasn't tested ahead of time.
One possible resolution, which I resisted for a long time but am warming up to, is to simply accept the concrete icu_calendar types in the format functions. There are two ways we could do this, with their own unique pros and cons:
- Keep
GetInputbut make it sealed and only implemented on our blessed types - Create different
format_date,format_time,format_datetime, ... functions, taking the concrete types and enabled/disabled based onFSetbounds
In both cases, we lose the ability to format, for example, a MonthDay. GetInput currently allows one to create a custom type with only the subset of fields that FSet requires. If we find this to be a problem, we could just add more types to icu_calendar to meet these use cases.
Thoughts? @Manishearth @zbraniecki
I'd rather still retain the ability to integrate with other crates, and the ability for people to make their own MonthDay types.
Keep GetInput but make it sealed and only implemented on our blessed typ
This is going to still be a trait mess since we only have one formatter type so this would need to be GetInput<FS>
I'd rather still retain the ability to integrate with other crates, and the ability for people to make their own MonthDay types.
I think when we first designed this crate I thought the DateTimeInput trait was overkill and wanted to use our types. Since then I think I've moved on to liking it and thinking it helps keep the API small.
ICU4X Dates are not 100% cheap to construct, so I'm worried about other datetime libraries paying that cost. But I don't strongly oppose this, if people feel that we should simplify by sealing, I think that's ok for now.
ICU4X Dates are not 100% cheap to construct, so I'm worried about other datetime libraries paying that cost.
DateTime<Iso> is quite cheap to construct, and most third-party crates would only be able to support the Iso calendar anyway.
- Keep GetInput but make it sealed and only implemented on our blessed types
- Create different format_date, format_time, format_datetime, ... functions, taking the concrete types and enabled/disabled based on FSet bounds
If we're restricting inputs to a fixed set, we might as well use different methods. That way we can document each one properly, and the client sees which concrete type they have to construct from the signature.
I don't see MonthDay as such a big issue. If we have MD skeleton, you can just format a Date with it. In your data model you are 99% going to have a Date, not a floating MonthDay.
- @sffc and @robertbastian bring the group up to speed with their current thinking
- @Manishearth I don't really see third-party crates implementing the trait; I see them asking us to implement the trait. I could see new smaller crates implementing it. But I don't see large crates implementing it due to it being finnicky. But, I think we should have seamless integration. So if we have the traits, we should be prepared to implement them ourselves. And if we format a struct, we should be prepared to have From impls.
- @Manishearth I don't really like having 5 different format functions. I guess, if we're moving toward sealing the traits, we should be clearer which types go in. That either means doing the traits in a way where it's abundantly clear which traits implement it, or doing the functions in which case it's abundantly clear which types go in. I guess the functions are nicer, but I don't like having a billion functions.
- @robertbastian I think the motivation for these traits was that you could avoid duplicate computation. If Jiff creates a date from a timestamp, and then Jiff converts to an icu_calendar type, we might have to repeat the computation. But it's not clear that we trust Jiff to do the computations in the way we expect it to do. Since the traits make the API really confusing, I'm leaning toward separate functions.
List of formattable types:
icu::calendar::Dateicu::calendar::Timeicu::calendar::DateTimeicu::timezone::UtcOffset(proposed)icu::timezone::ResolvedTimeZoneicu::timezone::OffsetDateTime(proposed)icu::timezone::ResolvedZonedDateTime
More discussion:
- @Manishearth Do we think that the use case for formatting all of these is equally likely?
- @Manishearth We could potentially have functions for the main types, but keep the trait for the other things that could potentially be formatted. Explicitly mark
format_generic()as a power user API and say that if you want to formatDate,DateTime, ..., you should use the specific methods, and document other types that implement the Input trait on format_generic. - @sffc I don't feel that this list is unbounded. We may eventually add YearMonth and MonthDay. I don't see it getting bigger than ~10.
- @Manishearth I don't want to add 9 methods and committing to a design that requires that we add more methods when we want new types.
- @sffc Just noting, a
Formatter<YMD>will disable itsformat_timefunction via another trait bound of some sort. - @manishearth
where FS: TimeSet + DateSet? - @nekevss I see cases where you have parts you want to format and you want to pass the parts.
- @Manishearth If we seal the trait, we could unseal in the future when we have concrete use cases.
- @sffc An advantage of the concrete functions is that type inference works better.
- @Manishearth Also, with concrete functions, the docs are cleaner.
- @robertbastian I don't think having many format functions is bad. Users will load the docs and see the 5-10
format_*functions and it will be clear all the possible things they can pass in. - @Manishearth The concrete functions will need to exist over FFI anyway
Options:
- Single
formatfunction with current signature but sealed trait format_*functions for all formattable concrete types in theicu_calendarandicu_timezonecrates at the time of the ICU4X 2.0 release (likely implemented via private trait). If we add more types post-2.0, we will discuss whether adding them via trait functions or additional concrete format functions. There is a list above of the rough set of types that will be added, it will either be 5 types or 7 types.- A small selection of
format_*functions as well as aformat_genericfunction with the sealed trait
Discussion on the specific options:
- @hsivonen I don't have super informed opinions but in general lots of generics are maybe not great so
format_*seems more straightforward - @nekevss I'm also out of context, but option 2 seemed to sound the best based on the little context I have.
- @echeran I feel the same way as @nekevss
Decision: Option 2
LGTM: @robertbastian @sffc @manishearth
New WG discussion:
Options:
- Single format function:
format(impl IsInCalendar<C> + AllInputMarkers<FSet>) - Concrete format functions:
format_date(),format_date_time(),format_date_time_zone()
Discussion:
- @manishearth Would like generic to be exposed, if deprioritized (
format_generic?) so that people wishing to do sufficiently generic things wrapping ICU4X - @sffc slightly lean towards it being exported
- @robertbastian I think the specific methods are nice to keep users from being confused, otherwise if you have a
Formatterwhat does it format? - @manishearth that's compelling to me
- @robertbastian Fine with generic with an explicit list in the docs that list types it is compatible with
- @sffc Date, Time, DateTime, ZonedDateTime, TimezoneInfo, UtcOffset
break
- @robertbastian If we list out the six types in the docs I'm fine with that.
- @manishearth What is @sffc's opposition to specific functions
- @sffc 1. Weird to see a
format_time()on a date formatter. 2. more annoying to test/etc. - @robertbastian: what if we had separate formatter types
- @sffc and @Manishearth: this is closer to what we used to have, it was annoying
- @manishearth this would be a huge pain to work on. I think multiple format functions is less pain.
- @manishearth Proposal. Rustdoc is able to show only impl'd functions on alias docs:
pub struct Formatter<FSet>(FSet);
pub struct YMD;
struct Date;
impl<FSet> Formatter<FSet> {
pub fn format() {
}
}
// Stretch 1
impl Formatter<YMD> {
/// See aliases module for more information
pub fn format_date(x: Date) {}
}
// Stretch 2
type YMDFormatter = Formatter<YMD>;
Stretch goals can be done post 2.0 if desired.
- @manishearth: NOt a fan of format over format_generic but it's fine
Agreed: @sffc @manishearth @robertbastian
I split the stretch goal to https://github.com/unicode-org/icu4x/issues/5861. The rest of this issue is done.