serde_with icon indicating copy to clipboard operation
serde_with copied to clipboard

Incorrect nanosecond serialization for very large or small values

Open matt-phylum opened this issue 1 year ago • 3 comments

TimestampNanoSeconds has the following serialization behavior:

  • ..1385-06-12T00:25:26.290448385Z: panic overflow when multiplying duration by scalar
  • 1385-06-12T00:25:26.290448385Z..1677-09-21T00:12:43.145224192Z: incorrect positive value
  • 1677-09-21T00:12:43.145224192Z: panic attempt to negate with overflow
  • 1677-09-21T00:12:43.145224193Z..2262-04-11T23:47:16.854775808Z: correct
  • 2262-04-11T23:47:16.854775808Z..2554-07-21T23:34:33.709551616Z: incorrect negative value
  • 2554-07-21T23:34:33.709551616Z..: panic overflow when multiplying duration by scalar

Not being able to serialize values outside the range 1677-09-21T00:12:43.145224193Z..2262-04-11T23:47:16.854775808Z is expected since it's serializing to an i64, but if the value is outside that range it should return an error, not panic or return incorrect values.

use chrono::{DateTime, Utc};
use serde::Serialize;
use serde_with::serde_as;

fn main() {
    #[serde_as]
    #[derive(Serialize)]
    struct S(#[serde_as(as = "serde_with::TimestampNanoSeconds")] DateTime<Utc>);

    fn try_serialize(v: DateTime<Utc>) {
        let s = std::panic::catch_unwind(|| {
            serde_json::to_string(&S(v)).unwrap_or_else(|e| format!("Serialization error: {e}"))
        })
        .unwrap_or_else(|p| {
            format!(
                "Panic: {}",
                p.downcast_ref::<&str>()
                    .map(|s| &**s)
                    .unwrap_or_else(|| p.downcast_ref::<String>().map(|s| &**s).unwrap())
            )
        });
        println!("{v}: {s}");
    }

    try_serialize(DateTime::<Utc>::MIN_UTC);
    try_serialize("1385-06-12T00:25:26.290448384Z".parse().unwrap());
    try_serialize("1385-06-12T00:25:26.290448385Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224191Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224192Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224193Z".parse().unwrap());
    try_serialize("2262-04-11T23:47:16.854775807Z".parse().unwrap());
    try_serialize("2262-04-11T23:47:16.854775808Z".parse().unwrap());
    try_serialize("2554-07-21T23:34:33.709551615Z".parse().unwrap());
    try_serialize("2554-07-21T23:34:33.709551616Z".parse().unwrap());
    try_serialize(DateTime::<Utc>::MAX_UTC);
}
-262143-01-01 00:00:00 UTC: Panic: overflow when multiplying duration by scalar
1385-06-12 00:25:26.290448384 UTC: Panic: overflow when multiplying duration by scalar
1385-06-12 00:25:26.290448385 UTC: 1
1677-09-21 00:12:43.145224191 UTC: 9223372036854775807
1677-09-21 00:12:43.145224192 UTC: Panic: attempt to negate with overflow
1677-09-21 00:12:43.145224193 UTC: -9223372036854775807
2262-04-11 23:47:16.854775807 UTC: 9223372036854775807
2262-04-11 23:47:16.854775808 UTC: -9223372036854775808
2554-07-21 23:34:33.709551615 UTC: -1
2554-07-21 23:34:33.709551616 UTC: Panic: overflow when multiplying duration by scalar
+262142-12-31 23:59:59.999999999 UTC: Panic: overflow when multiplying duration by scalar

A similar problem exists with TimestampNanoSecondsWithFrac:

  • ..1385-06-12T00:25:26.290448385Z: panic overflow when multiplying duration by scalar
  • 1385-06-12T00:25:26.290448385Z..2554-07-21T23:34:33.709551616Z: correct
  • 2554-07-21T23:34:33.709551616Z..: panic overflow when multiplying duration by scalar

TimeStampNanoSecondsWithFrac should never fail because it is serializing to f64 and f64 can represent much larger values, at the expense of precision.

The other TimeStamp* serializers are not affected because the range of values that can be represented by DateTime is smaller than the range of values that can be represented by an i64.

DurationNanoSeconds and DurationNanoSecondsFrac also have a similar problem:

  • ..18446744073.709551616s: correct
  • 18446744073.709551616s..: panic: overflow when multiplying duration by scalar

DurationNanoSeconds should return an error instead of panicking, and DurationNanoSecondsFrac should never fail.

DurationMicroSeconds and DurationMicroSecondsFrac fail the same way for 18446744073709.551616s... 18446744073709551.616s.. for DurationMilliSeconds and DurationMilliSecondsWithFrac. DurationSeconds and DurationSecondsWithFrac are unaffected.

matt-phylum avatar Jul 19 '24 18:07 matt-phylum

I was planning to open another issue for having a DefaultOnError that works for serialization, but SerializeAs doesn't have a way to distinguish between errors because the value is unrepresentable and errors because the underlying serializer reported an error. Maybe the fallible (without fraction) versions could to take a type parameter that defines whether an out of range value is skipped or None or 0.

matt-phylum avatar Jul 19 '24 18:07 matt-phylum

Thank you for the report. Yes, there is indeed a problem here. The code uses the normal math operations and some as casts. Fixing all of the instances will be a bit difficult, since the clippy lints for that will lead to many warnings, but not all of them actually being problematic. Many of the values will not be supported since they cannot be represented internally.

How did you come up with those time values to test? I wonder if you have a list of such strings/timestamps or if you used automated tools.

I do want to fix the issue, but it might take a while. It seems to require some steps

  • [ ] Check/Remove the use of ops::Mul and ops::Neg since they can fail. This covers the panics reported here.
  • [ ] Check all as casts. That should address "2262-04-11 23:47:16.854775808 UTC: -9223372036854775808".
  • [ ] Check all arithmetic operations. If there is an adjustment necessary, because of sub-unit precision, then the value might need to be adjusted +/- 1, which could lead to over/underflows. Not sure if that applies here, since those are all Nanoseconds, but would apply when using TimestampSeconds with a >500ms subsecond part.
  • [ ] Document the supported time ranges for the different types.

jonasbb avatar Jul 21 '24 14:07 jonasbb

I used binary search to find these edges between behaviors. I don't know if there's a list of such timestamps, but 9,223,372,036,854,775,807 is the limit of an i64 so those are probably good values to check for any implementation.

matt-phylum avatar Jul 22 '24 12:07 matt-phylum