icu4x
icu4x copied to clipboard
Consider exact policy around DateTimeError in datetime formatting
Currently there are 3 places where errors can be returned in datetime formatting:
- Constructor:
DateTimeFormatter::try_new- Example: locale missing
- Format function:
DateTimeFormatter::format- Example: mismatched calendar
- Write function:
FormattedDateTime::write_to- Example: data for pattern not loaded
Note that case 3 is a core::fmt::Error whereas cases 1 and 2 are DateTimeError. Any errors that occur during write_to are logged with log::warn! and then flattened to a core::fmt::Error.
Currently case 3 does not happen in normal public API usage of the icu::datetime crate. However, it is now possible to reach this case in DateTimePatternInterpolator (#3347).
If we moved this error to phase 2, we may end up slowing down the path for all users since we need to walk the whole pattern in order to figure out if the data is missing.
We should decide on this before shipping DateTimePatternInterpolator.
Want to discuss with:
- @Manishearth
- @robertbastian
- @sffc
It should be noted that the ::format function has an error only in DateTimeFormatter, not in TypedDateTimeFormatter, and the only error it returns is a mismatched calendar. So, any additional pre-processing we do would result in changing the signature of the ::format functions on TypedDateTimeFormatter, which we could do in 2.0. However, it might be nice to avoid this.
I'd like to proceed with the following solution:
- Change the experimental
DateTimeNames::with_patternto two functions:DateTimeNames::try_with_pattern, which checks that the names are loaded correctly for the patternDateTimeNames::with_pattern_unchecked, which has the current behavior (the function is unchecked but not unsafe)- And keep the existing
load_for_patternandinclude_for_patternthe way they are.
- If a DateTimeError occurs during writing, which shall be possible only if an "unchecked" function was called without its invariants upheld, do the following:
debug_assert!(false)with the error- Log the error using the internal logging utility (it currently does this)
- Only then fall back to
core::fmt::Error, which only happens without debug assertions, and which is silent only if logging is disabled
- Document that the only error in
DateTimeFormatter::formatis a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.
OK?
sgtm
Using core::fmt::Error means code like formatted_date_time.to_string() panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we return Ok and write something like <missing xy-data>.
(edit: this is actually a requirement of the trait)
Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.
This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.
Using
core::fmt::Errormeans code likeformatted_date_time.to_string()panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we returnOkand write something like<missing xy-data>.
Ok how about writing the string "(icu4x error)"? I don't want to carry around the Display impl for the Error type in release mode, and you get the whole error with debug assertions enabled.
Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.
This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.
I mostly agree. Not proposing to change anything right now.
We should use a different garbage string for each code location. (icu4x error) is not actionable.
The formatters share the same code path.
Neo: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/raw/neo.rs#L411
Neo Pattern: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/format/neo.rs#L1178C6-L1178C6
Old: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/format/datetime.rs#L78
The best we could do without an enum would be something like (error in icu::datetime::TypedDateTimeFormatter)
The GIGO doesn't have to happen at the top level that you linked. write_pattern should be infallible, and then write_field would do garbage out instead of returning errors: https://github.com/unicode-org/icu4x/blob/main/components/datetime/src/format/datetime.rs#L239. This lets us use different messages for different fields.
Hmm, replumbing the formating code is a fairly big change
My main concern with the Display impl is code size. We could avoid this by using a different error struct internally that carries a static str with the additional context
Or are you proposing that we produce outputs such as "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024" (instead of "Saturday, 6 January 2024")
Yes, that
IMO "{?}, 6 {?} 2024" is more user friendly than "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024"
What are your criteria for "user friendly"? {?} certainly looks nicer, but this is an error, it shouldn't look nice. We also want to communicate which names are not loaded; a bunch of {?} are going to be harder to fix, making it less user friendly.
I think the user persona in your mind is a developer, and in my mind, end user of a product that utilizes ICU4X. Both personas are important and they should receive output adapted to their needs.
but this is an error, it shouldn't look nice.
The developer seeing an erroneous output should be able to act on it, I agree. Your sample serves that well.
The end user of a web browser cannot act on the error, and should see broken output in context of their UI. I believe my sample serves that better. The end user may also get a text error in their debug console, or some telemetry may send error data to developer for debugging, but the UI impact should be low.
The use case for this message is DateTimeNames::with_pattern_unchecked. The unchecked part clearly communicates to the developer that they have to be careful with this and verify the output before any user ever sees it. This is why I'm primarily thinking about communicating to the developer instead of an end-user.
I realized that this is basically the same problem space as "GMT+?" (#2245)
Along those lines, we could make up pithy error strings for each field that are both user friendly and searchable. For the month field:
- "M?"
- "month?"
- "ICU4X:M?"
Wdyt?
Discussed in meeting to a partial conclusion
-
@robertbastian is it guaranteed that we have a number when we don't have patterns/names/whatever?
-
@manishearth only cases where DTF might not have a value from the date is cyclic/related iso. totally fine to ? in those cases, like time zones
-
@robertbastian - so the issue is missing symbol names, so if the data is there we can find a neutral way to display it
-
@robertbastian: original idea was <ICU4X error: month symbols missing>¿month 01? 6, 2024"
-
@robertbastian: shorter is "¿M01? 6, 2024". less clear it's broken
-
@zbraniecki: Do not want prod users seeing ICU4X error. it's meaningless. Ok with ¿? or brackets or whatever.
Ideas: "(M01) 6, 2024" "{M01} 6, 2024" "[M01] 6, 2024" "<M01> 6, 2024" "|M01| 6, 2024" "💥M01💥 6, 2024"
- @manishearth: parentheses are used in normal patterns so let's not use them to signal errors
- @robertbastian: could do:
- "|M01| 6, 2024" in prod
- "|ICU4X MISSING MONTH PATTERN M01| 6, 2024" in debug
- Can cause problems in tests if people only test debug or release mode.
Agreed (amongst @zbraniecki, @robertbastian, @manishearth):
- No scary strings in release/prod mode.
- Fine to have scary string in debug mode
- Shane can pick whatever brackets he wants
@sffc is this okay with you (and if so, please remove the discuss label)
I'd prefer to not include the month number because the error-state code path should be as minimal as possible; formatting "M01" is more complicated than "M" (even if only a little bit more complicated) and I want to start from the position of being as stripped-down as possible.
I am happy with 💥M or 〚M〛 or ⦃M⦄ or «M» or {M}
@zbraniecki How strongly would you like the month number to be rendered?
Discussion with @zbraniecki:
- MonthCode is available to the formatting function as a TinyStr so we can just echo it. Same with era code.
«M01»,«gregory» - For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets
- For weekday:
«E3» - For day period:
«am»«pm»
For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets
Why not use Writeable for u8?
Also I just realised
- Fine to have scary string in debug mode
We're debug_assert!ing, so debug mode will panic anyway.
For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets
Why not use
Writeable for u8?
- This is internal code and we can change it at any time; it does not impact the observable behavior
- Fewer branches / code paths; we just change the
Option<FixedDecimalFormatter>toFixedDecimalFormatterthat defaults to one that performs the desired error-like output - Although small,
impl Writeable for u8has some code size, and it is not an impl that is necessarily going to be present in the binary (https://github.com/unicode-org/icu4x/blob/main/utils/writeable/src/impls.rs) - Why not? Less code, fewer if statements, better maintainability.
The remaining work is semver breaking, so moving this to 2.0.