chrono
chrono copied to clipboard
Preserve `-00:00` offset
RFC 3339 and RFC 2822 make a distinction between an offset of +00:00 and -00:00.
RFC 2822:
The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.
From RFC3339:
If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.
Currently chrono drops the distinction, but I would like to preserve it.
- I noticed
format/scan.rswas already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors. - Modifying
Parsedto preserve this information is a bit messy as all the methods are public. Adding a public constantNO_OFFSET_INFOseemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed toParsed::set_offsetto99 * 3600 + 59 * 60 = 359940. The value ofNO_OFFSET_INFOis way outside that range, so no chance for conflicts. FixedOffsetcan be modified in a reasonably clean way to encode-0: shift the value 2 bits left and encode this special state as anOFFSET_UNKNOWNflag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.- The same shifting trick enables using
NonZeroI32, as long as we set one of the rightmost bits to something non-null with theOFFSET_NORMALflag. This givesFixedOffseta niche.Option<DateTime<FixedOffset>>is now the same size asDateTime<FixedOffset>: 16 bytes instead of 20. - For comparisons and hashing the flags are ignored. This way
-00:00is treated as equal to+00:00, which is usually what is expected. Also this way we remain backwards compatible. FixedOffset::no_offset_inforeturns whether the offset is-00:00.FixedOffset::OFFSET_UNKNOWNcan be used to initialize an offset to-00:00.
Currently chrono drops the distinction, but I would like to preserve it.
Do you have an actual use case for this?
Ai, why do you have to ask that :rofl: ? As a hobby I am making my own feed reader, and I would like to know the time an entry was published by the author in his own timezone. If the timestamp indicates it knows the time in UTC but the relation to the local time is unknown, I find that interesting. But currently chrono offers no easy way to know.
As a better reason: we would follow the RFCs more closely, and be able to round-trip these timestamps.
In general, I would concur we want to follow the RFCs closely. But at the same time, "you find it interesting" to know this doesn't seem like a strong justification for adding ~90 lines of code on net.
That is fair.
Honestly the changes felt small, so i did some counting:
- 17 lines for testing the size of common types in chrono.
- 30 would also be there if we wanted to use the part of encoding an offset in an
NonZeroI32to get the niche optimization. - that leaves ca 45 netto for the feature and documentation
RFC 3339 and RFC 2822 make a distinction between an offset of
+00:00and-00:00.RFC 2822:
The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.
From RFC3339:
If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.
Currently chrono drops the distinction, but I would like to preserve it.
* I noticed `format/scan.rs` was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors. * Modifying `Parsed` to preserve this information is a bit messy as all the methods are public. Adding a public constant `NO_OFFSET_INFO` seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed to `Parsed::set_offset` to `99 * 3600 + 59 * 60 = 359940`. The value of `NO_OFFSET_INFO` is way outside that range, so no chance for conflicts. * `FixedOffset` can be modified in a reasonably clean way to encode `-0`: shift the value 2 bits left and encode this special state as an `OFFSET_UNKNOWN` flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance. * The same shifting trick enables using `NonZeroI32`, as long as we set one of the rightmost bits to something non-null with the `OFFSET_NORMAL` flag. This gives `FixedOffset` a niche. `Option<DateTime<FixedOffset>>` is now the same size as `DateTime<FixedOffset>`: 16 bytes instead of 20. * For comparisons and hashing the flags are ignored. This way `-00:00` is treated as equal to `+00:00`, which is usually what is expected. Also this way we remain backwards compatible. * `FixedOffset::no_offset_info` returns whether the offset is `-00:00`. * `FixedOffset::OFFSET_UNKNOWN` can be used to initialize an offset to `-00:00`.
Can you find an appropriate docstring to place this information (especiially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into. You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.
Interestingly, this was a common case that was causing failures in tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.
Can you find an appropriate docstring to place this information (especially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into. You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.
Good idea, I have written an introduction for the FixedOffset type, and included a small reference in the docstring of the Parsed::offset field.
Interestingly, this was a common case that was causing failures in
tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.
Great if this would help with an existing test :smile:.
Codecov Report
Attention: Patch coverage is 91.95402% with 7 lines in your changes are missing coverage. Please review.
Project coverage is 92.16%. Comparing base (
e292d9b) to head (cc18784).
| Files | Patch % | Lines |
|---|---|---|
| src/offset/fixed.rs | 79.31% | 6 Missing :warning: |
| src/format/parsed.rs | 87.50% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1042 +/- ##
=======================================
Coverage 92.16% 92.16%
=======================================
Files 40 40
Lines 18052 18101 +49
=======================================
+ Hits 16637 16683 +46
- Misses 1415 1418 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.