chrono icon indicating copy to clipboard operation
chrono copied to clipboard

Why does `chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H")` pass?

Open MarcoGorelli opened this issue 1 year ago • 15 comments

Here's a reproducible example:

use chrono;
fn main() {
    println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H"));
    println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01", "%Y-%m"));
}

This outputs

Ok(2020-01-01)
Err(ParseError(NotEnough))

My question is - would it be possible to introduce a new ParseError, and have the first one throw ParseError(TooMany), instead of silently dropping '%H'?


As an aside, thanks for your work on this awesome project

MarcoGorelli avatar May 21 '23 07:05 MarcoGorelli

@MarcoGorelli would you be willing to submit a PR for the 0.4.x branch?

jtmoon79 avatar May 21 '23 22:05 jtmoon79

totally! I'll give this a go, thanks, this would be useful in polars

MarcoGorelli avatar May 22 '23 08:05 MarcoGorelli

closing per discussion in https://github.com/chronotope/chrono/pull/1079

MarcoGorelli avatar May 23 '23 10:05 MarcoGorelli

reopening as per https://github.com/chronotope/chrono/pull/1079#issuecomment-1559080812

MarcoGorelli avatar May 23 '23 11:05 MarcoGorelli

I don't know what to propose as a solution, but what would be really useful in polars would be if there was a way to tell if the format string contains more elements than will end up in the result, so that chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H") won't silently drop the hour component

MarcoGorelli avatar May 23 '23 11:05 MarcoGorelli

a way to tell if the format string contains more elements than will end up in the result

This is a safer approach to creating dates/datetimes so I like it.

Off-hand, I can think of three solutions:

  1. a flexible solution would be a parameter allow_unused: bool; when true it silently drops unused components, when false it returns None. (off-hand idea: Perhaps this new allow_unused: boolcould be default argument with help of crate default_args?)
  2. another solution is different functions; parse_from_str retains current behavior and new function parse_from_str_strict does not allow unused components and returns None if there are any.
  3. the least flexible is to change the behavior to return None in presence of unprocessed/unusable specifiers. This is the easiest to debug and upkeep.

jtmoon79 avatar May 24 '23 07:05 jtmoon79

We're definitely not going to depend on the default_args crate in chrono. I think we can have NaiveDate::parse_from_str() yield a ParseError with a new ParseErrorKind, you just can't change public API to get there.

djc avatar May 24 '23 09:05 djc

I think we can have NaiveDate::parse_from_str() yield a ParseError with a new ParseErrorKind

That would be good, but in https://github.com/chronotope/chrono/pull/1079#issuecomment-1559031545 it was brought up that it might be a valid use case to only parse the date part

MarcoGorelli avatar May 24 '23 10:05 MarcoGorelli

Related issue: https://github.com/chronotope/chrono/issues/528

pitdicker avatar May 24 '23 15:05 pitdicker

In case it helps, here's a minimal reproducer of the larger issue in polars:

use chrono;

fn parser(ts: &str, fmt: &str) -> Option<i64> {
    let res = chrono::NaiveDateTime::parse_from_str(ts, fmt)
        .ok()
        .map(|ndt| ndt.timestamp_millis());
    let res = res.or_else(|| {
        chrono::NaiveDate::parse_from_str(ts, fmt)
            .ok()
            .map(|nd| nd.and_hms_opt(0, 0, 0).unwrap().timestamp_millis())
    });
    res
}

fn main() {
    println!("res: {:?}", parser("2020-01-01 10", "%Y-%m-%d %H"));
    println!("res: {:?}", parser("2020-01-01", "%Y-%m-%d"));
}

This outputs

res: Some(1577836800000)
res: Some(1577836800000)

but ideally, in polars, the first one would need to be None.

On the polars side, would erroring if the given fmt contains ('%H' or '%I') and not '%M' be dangerous in any way?

Thanks all for your engagement here, much appreciated 🙏

MarcoGorelli avatar May 25 '23 14:05 MarcoGorelli

Ok. I have started playing with this, and most of it is easy to adjust.

  • When parsing a NaiveTime seconds and nanoseconds may already be omitted. With the note that seconds may only be omitted if nanoseconds are also omitted. It is easy to extend this to minutes.

  • When parsing DateTime we can default to the first day of the month when parsing year-month-day, or default to the first day of the week when parsing year-week-weekday.

  • Turning just a year into a full NaiveDate or NaiveDateTime seems a bit too much to me.

  • Parsing NaiveDateTime reuses the parts for DateTime and NaiveTime. ~It can be adjusted to default to midnight when parsing NaiveTime fails with NOT_ENOUGH.~ Edit: This is a bit more involved, it should instead check none of the time fields are populated.

Next I hit a guarantee in the documentation of NaiveDateTime::parse_from_str that seems a bit unfortunate:

Offset is ignored for the purpose of parsing.

assert_eq!(parse_from_str("2014-5-17T12:34:56+09:30", "%Y-%m-%dT%H:%M:%S%z"),
           Ok(NaiveDate::from_ymd_opt(2014, 5, 17).unwrap().and_hms_opt(12, 34, 56).unwrap()));

I would say input like this should just be parsed as DateTime<FixedOffset>. @djc Would it be akay to break this?

pitdicker avatar May 25 '23 18:05 pitdicker

One step further... When parsing NaiveDateTime it seems fine to add defaults. But when parsing a DateTime<FixedOffset> I would say that everything except seconds and nanoseconds need to be present. Defining an offset while the input doesn't contain minutes or even days doesn't make sense.

So better make it configurable. This can be done with an extra field on Parsed.

pitdicker avatar May 25 '23 18:05 pitdicker

I would say input like this should just be parsed as DateTime<FixedOffset>. @djc Would it be akay to break this?

While I conceptually agree, I feel like semver does not allow us to break this documented guarantee in 0.4.x.

The rest of your proposed logic sounds good to me. Should clearly document that...

djc avatar May 26 '23 11:05 djc

In addition to the above, would you consider parsing into NaiveDateTime if "hour" but no "minute" directive is provided?

This comes up sometimes in polars, e.g. https://github.com/pola-rs/polars/issues/10178

MarcoGorelli avatar Aug 12 '23 08:08 MarcoGorelli

@MarcoGorelli see #1094.

djc avatar Aug 12 '23 19:08 djc