hayagriva icon indicating copy to clipboard operation
hayagriva copied to clipboard

Support date times in date fields

Open m-haug opened this issue 3 years ago • 5 comments

This PR supports RFC3339 formatted datetimes in date fields. The datetimes are read and stored with maximum available precision.

The following points could still be addressed:

  • [ ] With the current implementation the string "19 94" is parsed as "1994". Should this be fixed?
  • [x] Roundtrips are not fully idempotent: the offset Z turns into +00:00.
  • [ ] The tests are relatively limited, perhaps property-based tests could be implemented to improve this.
  • [x] Dates could also support conversion from/to YAML hashes.

Technically, this is a breaking change, since the Date struct got new fields. For the future, this could be fixed by adding #[non_exhaustive] to the struct definition.

Fixes #30.

m-haug avatar Aug 18 '22 12:08 m-haug

I have a change request with regards to error handling in the parse function. The parsing error should be reported instead of panicking or silently ignoring the malformed date.

Yes, the panic was an oversight. Ignoring the ParseErrorKind::TooShort has been deliberate however, since chrono's RFC3339 parser expects a full datetime, including offset. I've documented this in an additional commit.

If at all possible, whitespace should not be accepted within the dates' numbers. However, if this is not possible with chrono without preprocessing, we could keep the current behavior. Alternatively, unscanny could be used for a small custom RFC3339 parser instead of chrono. I wanted to get rid of the regexes in favor of unscanny parsers anyways.

There's no need to fix chrono's parser. The problem is in hayagriva and can be fixed rather simply.

However, I've noticed that by ignoring ParseErrorKind::TooShort, we also accept dates like 1994-. If we want to fix this, we would probably have to implement a custom RFC3339 parser anyways. This could also be an opportunity to improve the error handling, since chrono's errors are admittedly not good.

Should we go down that road?

I am okay with the roundtrip conversion from "Z" to "+00:00". Is there a semantic difference between the Zulu timezone and "+00:00"? At any rate, they specify the same time zone right now and I think that's unlikely to change unless the IETF wants to break the world.

No, there's no semantic difference. It's purely syntactical.

Aren't YAML date hashes a bit niche? At any rate, I think they are not within the scope of this PR.

Breaking changes are fine at this stage. I do not think that Date needs a #[non_exhaustive]. Changes to the struct will likely change the semantics of all of its fields and also change what users can express and thus choose to, so the notification to library consumers that something is up is warranted.

I think we could improve type safety by using an enum instead of options. That way, encoding "impossible" configurations like year + day, but no month would become impossible. The existing members could then be accessors on the enum. Does that sound good?

m-haug avatar Aug 19 '22 07:08 m-haug

Hi @reknih, just a quick follow-up. Have you had some time to review the fixes and think about the questions?

m-haug avatar Aug 25 '22 06:08 m-haug

Hey @m-haug, @reknih is currently on vacation, so you'll probably have to wait some more.

On a side note, I'm curious about the enum thing. I see that it would prevent impossible configurations, but wouldn't there be lots of boilerplate and overlapping variants?

laurmaedje avatar Aug 25 '22 07:08 laurmaedje

Thanks for the heads-up and the feedback!

Yes, there would probably be some duplication in the enum itself, since "larger" variants, like a full datetime, have to contain the same information as "smaller" variants, like a date or a year only. This also depends on some other decisions surrounding the design of the parser. For example, should a time always contain a minute, or is an hour specification sufficient?

However, I don't think there would necessarily be lots of boilerplate when using the enum design, especially on the consumer side. By providing accessors (month / get_month), the existing interface could mostly be preserved (apart from requiring a method call). Internally, some boilerplate would probably be necessary to provide the accessors, but thanks to pattern matching with or-patterns that should also be limited.

It should also be possible to provide mutations capabilities in the enum design, albeit with more effort. Of the top of my head, I can see

  • precision-preserving operations, like set_month,
  • narrowing operations, like round_to_month or truncate_to_month,
  • and widening operations, like append_month.

The naming is obviously open to bike-shedding. The first and the third category may have some overlap, depending on the chosen semantics. For example, set_day on a date containing only year and month could automatically widen the date once.

If you have further feedback, or see anything that I may not have considered yet, I'd be thrilled to hear about it.

m-haug avatar Aug 25 '22 08:08 m-haug

The enum + methods design sounds good, in principle. One other consideration is whether it's a public enum or a private enum within a struct. That way we could also enforce that the individual fields are within bounds. It might also make sense to have a Date and a Time struct/enum which could be composed in a Datetime struct with Date and Option<Time> fields.

laurmaedje avatar Aug 26 '22 17:08 laurmaedje

Is there still interesting in completing this?

laurmaedje avatar Sep 04 '23 09:09 laurmaedje

Closed because of inactivity. #30 will remain open to track this.

reknih avatar Nov 21 '23 11:11 reknih