chrono icon indicating copy to clipboard operation
chrono copied to clipboard

Preserve `-00:00` offset

Open pitdicker opened this issue 2 years ago • 10 comments

RFC 3339 and RFC 2822 make a distinction between an offset of +00:00 and -00:00.

RFC 2822:

The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.

From RFC3339:

If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.

Currently chrono drops the distinction, but I would like to preserve it.

  • I noticed format/scan.rs was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors.
  • Modifying Parsed to preserve this information is a bit messy as all the methods are public. Adding a public constant NO_OFFSET_INFO seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed to Parsed::set_offset to 99 * 3600 + 59 * 60 = 359940. The value of NO_OFFSET_INFO is way outside that range, so no chance for conflicts.
  • FixedOffset can be modified in a reasonably clean way to encode -0: shift the value 2 bits left and encode this special state as an OFFSET_UNKNOWN flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.
  • The same shifting trick enables using NonZeroI32, as long as we set one of the rightmost bits to something non-null with the OFFSET_NORMAL flag. This gives FixedOffset a niche. Option<DateTime<FixedOffset>> is now the same size as DateTime<FixedOffset>: 16 bytes instead of 20.
  • For comparisons and hashing the flags are ignored. This way -00:00 is treated as equal to +00:00, which is usually what is expected. Also this way we remain backwards compatible.
  • FixedOffset::no_offset_info returns whether the offset is -00:00.
  • FixedOffset::OFFSET_UNKNOWN can be used to initialize an offset to -00:00.

pitdicker avatar Apr 28 '23 19:04 pitdicker

Currently chrono drops the distinction, but I would like to preserve it.

Do you have an actual use case for this?

djc avatar Apr 28 '23 20:04 djc

Ai, why do you have to ask that :rofl: ? As a hobby I am making my own feed reader, and I would like to know the time an entry was published by the author in his own timezone. If the timestamp indicates it knows the time in UTC but the relation to the local time is unknown, I find that interesting. But currently chrono offers no easy way to know.

pitdicker avatar Apr 28 '23 20:04 pitdicker

As a better reason: we would follow the RFCs more closely, and be able to round-trip these timestamps.

pitdicker avatar Apr 28 '23 20:04 pitdicker

In general, I would concur we want to follow the RFCs closely. But at the same time, "you find it interesting" to know this doesn't seem like a strong justification for adding ~90 lines of code on net.

djc avatar Apr 28 '23 20:04 djc

That is fair.

Honestly the changes felt small, so i did some counting:

  • 17 lines for testing the size of common types in chrono.
  • 30 would also be there if we wanted to use the part of encoding an offset in an NonZeroI32 to get the niche optimization.
  • that leaves ca 45 netto for the feature and documentation

pitdicker avatar Apr 28 '23 20:04 pitdicker

RFC 3339 and RFC 2822 make a distinction between an offset of +00:00 and -00:00.

RFC 2822:

The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and therefore indicates that the date-time contains no information about the local time zone.

From RFC3339:

If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.

Currently chrono drops the distinction, but I would like to preserve it.

* I noticed `format/scan.rs` was already written with the functionality of this RFCs in mind, but did not go all the way. It now follows the specs better and can sometimes give better parser errors.

* Modifying `Parsed` to preserve this information is a bit messy as all the methods are public. Adding a public constant `NO_OFFSET_INFO` seemed the cleanest solution. Parsing doesn't support more than two digits for the offset hours, limiting the values that are passed to `Parsed::set_offset` to `99 * 3600 + 59 * 60 = 359940`. The value of `NO_OFFSET_INFO` is way outside that range, so no chance for conflicts.

* `FixedOffset` can be modified in a reasonably clean way to encode `-0`: shift the value 2 bits left and encode this special state as an `OFFSET_UNKNOWN` flag in the rightmost bits. Advantage of this encoding is that it only takes a single shift right to get the offset in seconds, so this should not measurably impact performance.

* The same shifting trick enables using `NonZeroI32`, as long as we set one of the rightmost bits to something non-null with the `OFFSET_NORMAL` flag. This gives `FixedOffset` a niche. `Option<DateTime<FixedOffset>>` is now the same size as `DateTime<FixedOffset>`: 16 bytes instead of 20.

* For comparisons and hashing the flags are ignored. This way `-00:00` is treated as equal to `+00:00`, which is usually what is expected. Also this way we remain backwards compatible.

* `FixedOffset::no_offset_info` returns whether the offset is `-00:00`.

* `FixedOffset::OFFSET_UNKNOWN` can be used to initialize an offset to `-00:00`.

Can you find an appropriate docstring to place this information (especiially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into. You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.

jtmoon79 avatar Apr 29 '23 22:04 jtmoon79

Interestingly, this was a common case that was causing failures in tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.

esheppa avatar Apr 30 '23 13:04 esheppa

Can you find an appropriate docstring to place this information (especially the quoted RFC stuff)? This is a great explanation of little finicky area that I suspect quite a few users will run into. You might find it appropriate to split up the info and place within various user-facing functions. Whatever you think is good. So long as this subtlety is explained to the user.

Good idea, I have written an introduction for the FixedOffset type, and included a small reference in the docstring of the Parsed::offset field.

pitdicker avatar May 04 '23 09:05 pitdicker

Interestingly, this was a common case that was causing failures in tests/dateutils.rs::verify_against_date_command_format_local, if this is merged, we may be able to ensure the offsets match as well, rather than just the date/time values.

Great if this would help with an existing test :smile:.

pitdicker avatar May 04 '23 09:05 pitdicker

Codecov Report

Attention: Patch coverage is 91.95402% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 92.16%. Comparing base (e292d9b) to head (cc18784).

Files Patch % Lines
src/offset/fixed.rs 79.31% 6 Missing :warning:
src/format/parsed.rs 87.50% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1042   +/-   ##
=======================================
  Coverage   92.16%   92.16%           
=======================================
  Files          40       40           
  Lines       18052    18101   +49     
=======================================
+ Hits        16637    16683   +46     
- Misses       1415     1418    +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 04 '24 16:02 codecov[bot]