icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Consider narrower error type between try_new_from_codes and try_from_fields

Open sffc opened this issue 3 months ago • 35 comments

issue: these new variants are only returned by from-fields, not by constructors on Date (which is what this type is for). please lets not have catch-all errors again

Originally posted by @robertbastian in https://github.com/unicode-org/icu4x/pull/6910#discussion_r2363319299

sffc avatar Oct 01 '25 01:10 sffc

Also consider making an error module as suggested by @Manishearth in https://github.com/unicode-org/icu4x/pull/7012#discussion_r2407180399

sffc avatar Oct 07 '25 02:10 sffc

ok so in terms of new APIs in 2.1, it is only Date::try_from_fields that needs to have a new error type. We'll need to answer this question for Date::try_add in 2.2, unless we somehow find time to graduate those APIs in 2.1, which I don't think will happen.

Sadly Date::try_new_from_codes returns a type called DateError; ideally it would have been named DateFromCodesError. Maybe in 3.0 (or in 3.0 we can just delete that function).

Here is DateError:

pub enum DateError {
    Range {
        /// The field that is out of range, such as "year"
        field: &'static str,
        /// The actual value
        value: i32,
        /// The minimum value (inclusive). This might not be tight.
        min: i32,
        /// The maximum value (inclusive). This might not be tight.
        max: i32,
    },
    UnknownEra,
    UnknownMonthCode(MonthCode),
    InconsistentYear,
    InconsistentMonth,
    NotEnoughFields,
}

The last three variants are the new ones reachable from Date::try_from_fields rather than Date::try_new_from_codes.

There are two things I could do:

  1. Make pub enum DateFromFieldsError with a variant DateError plus the other three new ones.
  2. Make pub enum DateFromFieldsError with all six variants.

In both cases, I'll revert DateError back to 2.0 state.

Thoughts @robertbastian ?

sffc avatar Oct 14 '25 04:10 sffc

I'm leaning toward expanding all six variants, because I think we may eventually want to get rid of DateError, and because I think I want to make Range into its own struct DateRangeError.

Maybe I want to split UnknownMonthCode, too, since it is used in a lot of different situations.

So then we get:

pub enum DateFromFieldsError {
    Range(DateRangeError),
    UnknownEra,

    /// MonthCode syntax is invalid
    InvalidMonthCode(MonthCode),
    /// MonthCode doesn't exist in the calendar
    UnknownMonthCodeInCalendar(MonthCode),
    /// MonthCode doesn't exist in the specific year
    UnknownMonthCodeInYear(MonthCode),

    InconsistentYear,
    InconsistentMonth,
    NotEnoughFields,
}

Ideally we wouldn't need InvalidMonthCode because ideally MonthCode would check its syntax on construction, but for some reason MonthCode has a public TinyStr field, which means this error case is reachable in pretty much all of our APIs that take a MonthCode. Hmm, should I change DateFields to take a &str instead of MonthCode?

sffc avatar Oct 14 '25 04:10 sffc

and because I think I want to make Range into its own struct DateRangeError.

Already the case: https://unicode-org.github.io/icu4x/rustdoc/icu/calendar/struct.RangeError.html

But we can't wrap it withotu it being a breaking change. File-and-forget 3.0 issue, I say.

Maybe I want to split UnknownMonthCode, too, since it is used in a lot of different situations.

I would love to differentiate between "you gave me a crap month code", "you gave me a month code that does not work for this calendar" and "you gave me a month code that does not work for this date". Mostly between the second two, but differentiating between Mabc and M05L may be useful.

Hmm, should I change DateFields to take a &str instead of MonthCode?

I think to some extent the choice of MonthCode having a public field was since the month code format may extend in the future. PErhaps that was a wrong call.

I think we should still take a MonthCode, it's fine if we validate it in-code.

Manishearth avatar Oct 14 '25 06:10 Manishearth

It should be an error with 6 variants, I'd like our errors to be as flat as possible. I'd also want the range variant to be the same as in DateError, not wrapping a RangeError, as that's another level a caller would have to navigate.

Hmm, should I change DateFields to take a &str instead of MonthCode?

I think that makes sense, we also accept eras as &str. TinyAsciiStrs are convenient for return values, because then there's no life time, but for function parameters they are just annoying to construct.

robertbastian avatar Oct 14 '25 08:10 robertbastian

Using &str also makes FFI more aligned, as there we already return UnknownMonthCode if the &str fails to parse into a MonthCode; whereas in Rust the client has to handle that separately.

robertbastian avatar Oct 14 '25 08:10 robertbastian

Hmmm, that argument is compelling.

I dislike having lifetimes on input structs but it's ... fine.

Manishearth avatar Oct 14 '25 16:10 Manishearth

We already have a lifetime on it for the era code.

sffc avatar Oct 14 '25 16:10 sffc

oh, then i'm convinced

Manishearth avatar Oct 14 '25 17:10 Manishearth

Does not appear to need discussion since there is rough alignment on Shane's plan above, remaining discussion can happen on PRs.

Manishearth avatar Oct 15 '25 03:10 Manishearth

Assigning Shane for milestone-tracking reasons since he is currently working on. Can be reassigned after #7100

Manishearth avatar Oct 15 '25 03:10 Manishearth

After #7100, still want to add more testing around making sure the correct month code error type is returned in various month code scenarios.

sffc avatar Oct 15 '25 08:10 sffc

Some open questions from #7100:

  • (2.1 blocking) Name of the month code error variants. I have InvalidMonthCode, UnknownMonthCodeForCalendar, UnknownMonthCodeForYear. Please paint the shed. https://github.com/unicode-org/icu4x/pull/7100#discussion_r2431986264
  • (2.1 blocking) Should InvalidMonthCode and UnknownMonthCodeForCalendar be merged? https://github.com/unicode-org/icu4x/pull/7100#discussion_r2431976060
  • (2.1 blocking) Should DateFromFieldsError::Range be expanded into fields? https://github.com/unicode-org/icu4x/pull/7100#discussion_r2431977477
  • (non-blocking) Should we tweak the error cases in EcmaReferenceYearError? https://github.com/unicode-org/icu4x/pull/7100#discussion_r2431989858

sffc avatar Oct 15 '25 21:10 sffc

For the first two points I think UnknownMonthCodeForCalendar (maybe MonthCodeNotInCalendar) and MonthCodeNotInYear is good, with no separate InvalidMonthCode

Should DateFromFieldsError::Range be expanded into fields

I like consistency, weak "yes".

Should we tweak the error cases in EcmaReferenceYearError

I think the current set of InvalidMonthCode, UnknownMonthCodeForCalendar, NotEnoughFields is decent, we can remove the first one if we remove it for the other errors.

Manishearth avatar Oct 15 '25 22:10 Manishearth

My initial positions:

  • (2.1 blocking) Name of the month code error variants. I have InvalidMonthCode, UnknownMonthCodeForCalendar, UnknownMonthCodeForYear. Please paint the shed.

How about:

  • DateFromFieldsError::MonthCodeSyntax
  • DateFromFieldsError::MonthCodeNotInCalendar
  • DateFromFieldsError::MonthCodeNotInYear

(this is similar to @Manishearth's suggestion but I came up with it without reading his comment first which means it is more likely to be intuitive)

Weak preference to keep them separate, because they are distinct errors, even if the way to handle them is usually the same. Let users merge them if they want to.

I could similarly be convinced that we should have a EraSyntax error.

Medium-strong preference to not inline the fields, because I think it is useful to be able to retain a value of type RangeError and do things with it. Consistency carries very little weight because I don't see DateError as a model to be followed. I can even just go ahead and mark it deprecated right now if that resolves the concern about consistency.

I think returning Option<Result> is better, which eliminates one error case. We get rid of the other when we adopt ValidMonthCode.

sffc avatar Oct 15 '25 23:10 sffc

Also, I wanted to highlight another question, just to make sure: if ordinal month is used as input, and it is too big (or zero), do we agree to return a Range error as opposed to a MonthCode error?

(medium-strong preference to return a range error)

sffc avatar Oct 15 '25 23:10 sffc

Also, I wanted to highlight another question, just to make sure: if ordinal month is used as input, and it is too big (or zero), do we agree to return a Range error as opposed to a MonthCode error?

Good catch. I think so? It might mean we need to keep track of a few more things, but yes I think we should return a Range error here.

Manishearth avatar Oct 16 '25 00:10 Manishearth

Starting point:

#[non_exhaustive]
pub enum DateFromFieldsError {
    Range(RangeError),
    UnknownEra,
    InvalidMonthCode,
    UnknownMonthCodeForCalendar, // MonthCodeNotInCalendar
    UnknownMonthCodeForYear, // MonthCodeNotInYear
    InconsistentYear,
    InconsistentMonth,
    NotEnoughFields,
}

enum DateError {
    Range {
        field: &'static str,
        value: i32,
        min: i32,
        max: i32,
    },
    // ...
}

Items to decide:

  1. Change UnknownMonthCodeForCalendar to MonthCodeNotInCalendar
  2. Change UnknownMonthCodeForYear to MonthCodeNotInYear
  3. Fold InvalidMonthCode into MonthCodeNotInCalendar?
  4. Keep RangeError as a field of the Range variant
  5. Return Range errors for out-of-bounds ordinal months

Discussion:

  • (introduces issues)
  • @sffc Do we have consensus on 1 and 2?
  • @Manishearth I think so
  • @zbraniecki Yeah
  • @Manishearth 3 is interesting. The month code syntax could be expanded. The Chinese calendar won't get new month codes, though. When the user sees "invalid month code" versus "month code not in calendar", there isn't an actionable difference. But there is a big actionable difference between "month code not in calendar" and "month code not in year", since it indicates that you have inconsistent data, or things that could happen if you're changing lunar approximations.
  • @zbraniecki I'm thinking through the edge case. Would the developer want to show the user that the month is invalid?
  • @Manishearth Month codes are a developer concept. A developer had a role in producing a month code.
  • @zbraniecki: Keeping the distinction makes sense because the path to the errors is differet. I don't think an ICU4X user would handle it differently.
  • @zbraniecki If I'm building a system and the month code has a syntax error, I'm looking at where the month code came from. If I get month code not-in-calendar, I'm looking at why I'm matching the code with the wrong calendar.
  • @sffc If we keep 3, let's bikehed the name. MonthCodeInvalidSyntax?
  • @manishearth and @zbraniecki: sure
  • @manishearth maybe we should document that syntax may change
  • @sffc: starter position: when we add new APIs here i don't want to add more DateError users, we should add narrow errors. That said we have it now and it's returned from try_new_from_codes and Date::try_new_{chinese, japanese, dangi}.
  • observation: RangeError is not #[non_exhaustive].
  • @manishearth: maybe have a new good RangeError type.
  • @manishearth the two reasons for inlining as an enum struct to me are consistency. DateError already needs improvement so I'm not convinced about trying to be consistent with it. Matching... I expect most times people will be doing Range(..) and if they need to sub-match they may even need a
  • @sffc matching on errors is already a power user thing. Importing RangeError seems like a small cost.
  • @zbraniecki: No strong position. We probably want to make RangeError non-exhaustive in 3.0, right?
  • @manishearth yes and replace the string with an enum
  • @sffc And it will be less churn to just change RangeError in 3.0, instead of having to change DateFromFieldsError, too
  • Agreement on 4
  • @sffc I think since ordinal month are a different field, they can and should use a different error type
  • @Manishearth I think debugging process is different because numbers are more likely to get piped through
  • @Manishearth @zbraniecki @sffc agree on 5
  • @manishearth let's document the error cases.

Conclusion:

  • month code errors are MonthCodeNotInCalendar, MonthCodeNotInYear, MonthCodeInvalidSyntax
  • Range error is Range(RangeError) (not an inlined struct variant). For 3.0 consider making RangeError be non_exhaustive and changing the string to an enum.
  • out-of-range ordinal months return Range

LGTM: @sffc @Manishearth @zbraniecki

sffc avatar Oct 16 '25 18:10 sffc

I don't think we should start designing our APIs with 3.0 changes in mind. We have no timeline for 3.0 and I don't want to be inconsistent for an indeterminate amount of time.

DateError::Range was added in our last release. We shouldn't already consider it wrong and plan to change it, we should just accept that that's what we did and do the same for DateFromFieldsError.

robertbastian avatar Oct 20 '25 08:10 robertbastian

I don't see consistency as a strong motivator here. Even if we don't plan to make a change in 3.0 I think the inline error is not as good for this type.

Manishearth avatar Oct 20 '25 10:10 Manishearth

I don't see consistency as a strong motivator here.

Well I do. You both approved DateError::Range when I implemented fine-grained error types in this crate last year, so it's weird that you both now insist that "it's not as good", given that these are the same variants.

robertbastian avatar Oct 20 '25 10:10 robertbastian

Conceptually, DateFromFieldsError should be the same as DateError, with the added variants InconsistentYear, InconsistentMonth, and NotEnoughFields. I don't understand why we're now redesigning both RangeError and UnknownMonthCode. We just released these variants in our last release.

For someone migrating from from-codes to from-fields the three new variants are very easy to understand and dismiss. But the new month error variants are not an easy migration, and the new RangeError shape is just a pointless change because you now prefer a different style than last year. What justifies this churn?

robertbastian avatar Oct 20 '25 13:10 robertbastian

A single UnknownMonthCode error variant was objectively an oversight, since we have a use case for it now that we didn't have last year. So, there's already not a superset relationship.

It's unfortunate that we shipped this error variant in 2.0 and are discovering in 2.1 that we made design mistakes. I don't like whataboutisms, but we are also changing 2.0 design with time zone variants and calendar constructors. 2.1 is no worse a time to make this change than 2.2 or 2.3.

If someone is migrating from from-codes to from-fields, learning about the new error variants is a good thing.

sffc avatar Oct 20 '25 19:10 sffc

weird that you both now insist that "it's not as good", given that these are the same variants.

I consider the inline enum variant to be perfectly acceptable. I had not considered the other model when reviewing it the first time, and at the moment I feel like there are good reasons to prefer not inlining the struct (but it's weird to say i'm "insisting" that, I have not been expressing strong opinions here).

I do think "let's not churn" is a strong motivator to stick to the existing model.

Manishearth avatar Oct 20 '25 19:10 Manishearth

Sorry but what's the "churn"?

If it's people migrating to from-fields in 2.1, this isn't measurable at all.

If we want to reduce churn, then we should use the nested error so that we don't need to change the enum in 3.0 when we improve RangeError.

sffc avatar Oct 20 '25 20:10 sffc

I think Robert is opposed to "we will change this in 3.0" being a foregone conclusion. That's definitely churn.

Manishearth avatar Oct 20 '25 20:10 Manishearth

Ok, aside from the consistency and churn arguments,

It's completely reasonable for code to want to have a code path that works in terms of RangeError. There are multiple APIs that return it, including calendar specific constructors, and the future arithmetic API. You should be able to write a function like

fn process_range_error(err: RangeError)

And it should be easy to use from all the APIs that return range errors.

sffc avatar Oct 20 '25 20:10 sffc

Let's say for the sake of argument that the consistency and churn arguments cancel out, since there are potentially reasonable positions on both sides.

Looking then at technical merit:

Advantages of nested RangeError:

  • Better code sharing between functions that return RangeError
  • Allows for the variant to be non-exhaustive

Advantages of expanded RangeError:

  • Avoids having to import an extra type when matching on it

Did I miss anything?

sffc avatar Oct 20 '25 20:10 sffc

Not that I know of.

Manishearth avatar Oct 20 '25 23:10 Manishearth

I do think that this is at the level of "reasonable people may disagree on the 'best' API" where I would weigh consistency and churn avoidance more than trying to find the best consensus API, though.

Manishearth avatar Oct 20 '25 23:10 Manishearth