chrono icon indicating copy to clipboard operation
chrono copied to clipboard

Consider refactoring chrono to avoid systematic misuse of panics

Open udoprog opened this issue 1 year ago • 7 comments

Many constructors in chrono has the behavior to panic on invalid inputs. This issue proposes to systematically change this behavior to instead make use of fallible constructors which in effect means the removal of the *_opt family of functions.

I'll be trying to make an argument for why below, at the end are the concrete changes I'm proposing.

Why should we do this?

The use of chrono has been the culprit of many unwanted panics in code that I've personally written in my career as a Rust programmer. The reason why this happens is sloppiness from my part. I use a method without realizing that it panics even though it says so in the documentation. But after consideration I've come to believe that the design of these methods could be improved.

The standard I abide by when it comes to panicking is broadly that correct Rust programs should not panic. So what I'll try and determine next is if uses of these constructors contributes to writing correct Rust programs that do or do not panic.

There are a few tests I'll be applying here each time one of these methods are used:

  1. Does the caller check the necessary preconditions before calling the function?
  2. Does the caller control the input to the function so that it's runtime invariant?
  3. Is the function call in a context where it's OK to panic. Such as a test case or a short lived CLI program.

Preconditions

If we look at the doc for most of these functions they contain wording such as TimeZone::ymd:

Panics on the out-of-range date, invalid month and/or day.

This is a difficult precondition to check. An out-of-range date isn't a mechanical test such as "is this value smaller than 1000". It requires a deep understanding of the gregorian calendar system which chrono itself is responsible for providing. It's implausible that uses ensure that its input passes the required condition due to complexity. So the first test probably fails systematically across many projects.

Runtime invariant inputs or tests

It's easy to find many uses of these functions where the input is runtime variant and not tests. I've found that many of these relate to ad-hoc use in combination with serde, which is a strong indication that its arguments truly are runtime variant. A bit of taxonomy lead me to believe that there might be a case of systematic misuse.

What to do?

I believe many of the examples one can find of chrono production use fails the tests I proposed. The way to correct these uses would be through the *_opt family of methods. But since the non-opt methods seems to be considered the default and probably leads to systematic problems when used I think it's worth considering changing them.

What I'd suggest and would like to discuss is:

  • Rewrite all panicking functions that do not have an easy precondition to test to be fallible instead.
  • Consider rewriting other functions which panic if there is a high risk of misuse where the caller should actually be using the *_opt sibling function but for one reason or another doesn't.
  • For all the rewritten functions: deprecate or remove their *_opt sibling function.

Thank you!

udoprog avatar Sep 10 '22 01:09 udoprog

Thanks for raising the issue @udoprog. I think it should definitely be considered whether we promote the Option returning methods as the default ones, rather than the panicking ones. We could also consider whether returning a Result with an error type that implements Error would be even better, however this would have to be gated by the std feature. Now is a good time to do this as we are now working on a 0.5 release which can include breaking changes.

My summary of things to consider:

  • Consider having the fallible methods be the default, with the panicking methods having some sort of common suffix (or just require users to ? or unwrap them)
  • Consider more use of the type system (eg https://github.com/chronotope/chrono/issues/727) - this works well for Month (similar to the "parse, dont' validate" strategy)
  • Helper methods where returns have multiple failure cases (eg https://github.com/chronotope/chrono/issues/716 and also on TimeDelta)
  • Add and Sub - some implementations can panic (eg Add<Days> for DateTime<Tz>) - potentially we should only implement these for cases where the only way to panic is due to numeric overflow (we also have to consider here numeric overflow of the underlying storage, or simply reaching the maximum allowable year value)

esheppa avatar Sep 10 '22 04:09 esheppa

Yes, one of things on my list for 0.5 is to audit all causes of panic in Chrono and remove it where possible. The cases that would be hard to prevent is things like Add or Sub impls that might overflow, where we have started making sure we offer checked variants as well.

Any help you could offer on this topic would be much appreciated!

djc avatar Sep 10 '22 07:09 djc

Thanks for the positive reception! I'd be glad to put together a PR as a proposal now that I've gotten a feel for the waters.

udoprog avatar Sep 10 '22 13:09 udoprog

So I've got a work-in-progress branch where I tested some of the above things out, a preliminary review to see if it's on the right track would be nice before I do the work of making all doctests pass 😅. All the regular unit tests do pass.

Note: this has been opened as #817

The broad strokes are:

I've opted for a single ChronoError type which has a private kind that determines the message presented to the user (but is not public). I initially started with one error type per kind of error, but eventually noticed that it would make propagating errors overly hard like how constructing a DateTime either raises an error because the date or the time was invalid.

I've opted to make loading the local timezone fallible instead of panicking (note that I have not yet changed the underlying implementations to be fallible). I think this is motivated because technically the underlying platform might perform checks which might fail all though what these look like exactly is a bit unclear. This has the side effect of making a number of TimeZone methods fallible - which is unfortunate because a number of timezones can easily be interacted with without erroring (Utc and FixedOffset). To mitigate this I've introduced a separate FixedTimeZone trait that extends the TimeZone trait which has a number of non-fallible methods.

Making things fallible had the side effect of clearly marking the standard trait implementations which might panic, these are:

  • The From<T> implementations which interacts with the Local timezone, because this doesn't implement FixedTimeZone. Where appropriate these have been moved to be TryFrom<Error = ChronoError> implementations.
  • As mentioned above Add and Sub.

If you think something is going in the wrong direction I'd appreciate the guidance.

udoprog avatar Sep 10 '22 23:09 udoprog

We could also consider whether returning a Result with an error type that implements Error would be even better, however this would have to be gated by the std feature.

core::error::Error exists in nightly now.

Another alternative is to use the core2 crate to polyfill the error type for no_std targets.

bbqsrc avatar Sep 11 '22 11:09 bbqsrc

I think returning Result for these cases would be mostly more idiomatic than Option, we can guard implementations of the std Error trait as necessary.

djc avatar Sep 11 '22 11:09 djc

I think returning Result for these cases would be mostly more idiomatic than Option, we can guard implementations of the std Error trait as necessary.

That's what I've opted for in #817. I also thought that's what @esheppa meant!

udoprog avatar Sep 11 '22 16:09 udoprog