chrono
chrono copied to clipboard
Why does `chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H")` pass?
Here's a reproducible example:
use chrono;
fn main() {
println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H"));
println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01", "%Y-%m"));
}
This outputs
Ok(2020-01-01)
Err(ParseError(NotEnough))
My question is - would it be possible to introduce a new ParseError
, and have the first one throw ParseError(TooMany)
, instead of silently dropping '%H'
?
As an aside, thanks for your work on this awesome project
@MarcoGorelli would you be willing to submit a PR for the 0.4.x
branch?
totally! I'll give this a go, thanks, this would be useful in polars
closing per discussion in https://github.com/chronotope/chrono/pull/1079
reopening as per https://github.com/chronotope/chrono/pull/1079#issuecomment-1559080812
I don't know what to propose as a solution, but what would be really useful in polars would be if there was a way to tell if the format string contains more elements than will end up in the result, so that chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H")
won't silently drop the hour
component
a way to tell if the format string contains more elements than will end up in the result
This is a safer approach to creating dates/datetimes so I like it.
Off-hand, I can think of three solutions:
- a flexible solution would be a parameter
allow_unused: bool
; whentrue
it silently drops unused components, whenfalse
it returnsNone
. (off-hand idea: Perhaps this newallow_unused: bool
could be default argument with help of cratedefault_args
?) - another solution is different functions;
parse_from_str
retains current behavior and new functionparse_from_str_strict
does not allow unused components and returnsNone
if there are any. - the least flexible is to change the behavior to return
None
in presence of unprocessed/unusable specifiers. This is the easiest to debug and upkeep.
We're definitely not going to depend on the default_args crate in chrono. I think we can have NaiveDate::parse_from_str()
yield a ParseError
with a new ParseErrorKind
, you just can't change public API to get there.
I think we can have NaiveDate::parse_from_str() yield a ParseError with a new ParseErrorKind
That would be good, but in https://github.com/chronotope/chrono/pull/1079#issuecomment-1559031545 it was brought up that it might be a valid use case to only parse the date part
Related issue: https://github.com/chronotope/chrono/issues/528
In case it helps, here's a minimal reproducer of the larger issue in polars:
use chrono;
fn parser(ts: &str, fmt: &str) -> Option<i64> {
let res = chrono::NaiveDateTime::parse_from_str(ts, fmt)
.ok()
.map(|ndt| ndt.timestamp_millis());
let res = res.or_else(|| {
chrono::NaiveDate::parse_from_str(ts, fmt)
.ok()
.map(|nd| nd.and_hms_opt(0, 0, 0).unwrap().timestamp_millis())
});
res
}
fn main() {
println!("res: {:?}", parser("2020-01-01 10", "%Y-%m-%d %H"));
println!("res: {:?}", parser("2020-01-01", "%Y-%m-%d"));
}
This outputs
res: Some(1577836800000)
res: Some(1577836800000)
but ideally, in polars, the first one would need to be None
.
On the polars side, would erroring if the given fmt
contains ('%H'
or '%I'
) and not '%M'
be dangerous in any way?
Thanks all for your engagement here, much appreciated 🙏
Ok. I have started playing with this, and most of it is easy to adjust.
-
When parsing a
NaiveTime
seconds and nanoseconds may already be omitted. With the note that seconds may only be omitted if nanoseconds are also omitted. It is easy to extend this to minutes. -
When parsing
DateTime
we can default to the first day of the month when parsing year-month-day, or default to the first day of the week when parsing year-week-weekday. -
Turning just a year into a full
NaiveDate
orNaiveDateTime
seems a bit too much to me. -
Parsing
NaiveDateTime
reuses the parts forDateTime
andNaiveTime
. ~It can be adjusted to default to midnight when parsingNaiveTime
fails withNOT_ENOUGH
.~ Edit: This is a bit more involved, it should instead check none of the time fields are populated.
Next I hit a guarantee in the documentation of NaiveDateTime::parse_from_str
that seems a bit unfortunate:
Offset is ignored for the purpose of parsing.
assert_eq!(parse_from_str("2014-5-17T12:34:56+09:30", "%Y-%m-%dT%H:%M:%S%z"), Ok(NaiveDate::from_ymd_opt(2014, 5, 17).unwrap().and_hms_opt(12, 34, 56).unwrap()));
I would say input like this should just be parsed as DateTime<FixedOffset>
. @djc Would it be akay to break this?
One step further... When parsing NaiveDateTime
it seems fine to add defaults. But when parsing a DateTime<FixedOffset>
I would say that everything except seconds and nanoseconds need to be present. Defining an offset while the input doesn't contain minutes or even days doesn't make sense.
So better make it configurable. This can be done with an extra field on Parsed
.
I would say input like this should just be parsed as
DateTime<FixedOffset>
. @djc Would it be akay to break this?
While I conceptually agree, I feel like semver does not allow us to break this documented guarantee in 0.4.x.
The rest of your proposed logic sounds good to me. Should clearly document that...
In addition to the above, would you consider parsing into NaiveDateTime
if "hour" but no "minute" directive is provided?
This comes up sometimes in polars, e.g. https://github.com/pola-rs/polars/issues/10178
@MarcoGorelli see #1094.