chrono
chrono copied to clipboard
Issue #660 whitespace exact
be exact about whitespace
Be exact about whitespace in parsing. This changes
pattern matching in format::parse::parse
as it does not allow
arbitrary whitespace before, after, or between the datetime specifiers.
format/parse.rs:datetime_from_str
is exact about whitespace in
the passed data s
and passed strftime format fmt
.
Also be more exacting about colons and whitespace around timezones.
Instead of unlimited colons and whitespace, only match a more limited
possible set of leading colons and whitespace.
(See fn colon_or_space
https://github.com/chronotope/chrono/pull/807/files#diff-679cb1eea69fd36c51536ec228e61d6b0e40ade95997877f6f853e6ad31de5f1R224-R227).
Also rename some test functions to better describe what is tested.
~To help reviewers understand how test results changed before and after, I created a branch Issue660_whitespace_exact_try6_prechange_testsonly
that has added same set of test cases as this branch Issue660_whitespace_exact_try6
, but does not have the underlying changes for whitespace and colons (i.e. Issue660_whitespace_exact_try6_prechange_testsonly
is the "before significant changes" branch, and Issue660_whitespace_exact_try6
is the "after significant changes" branch)~ https://github.com/jtmoon79/chrono/compare/Issue660_whitespace_exact_try6_prechange_testsonly..Issue660_whitespace_exact_try6
~The tests that result in different values are marked with // TESTDIFF
.~
This is a lot of changes in this PR. IMO, it covers one idea so it suitable for one PR; colons and whitespace should be handled more exactly. The changes for both of these tended to be the same code lines, so I thought it was sensible to make the changes all in one commit. Hopefully the additional test cases provide enough confidence for accepting this significant change.
Thanks for contributing to chrono!
- [ ] Have you added yourself and the change to the changelog? (Don't worry about adding the PR number)
- [X] If this pull request fixes a bug, does it add a test that verifies that we can't reintroduce it?
I did not add to the CHANGELOG.md
because there was no section 0.5
to add a new line item. I can add such a section if the reviewers would like.
(In a prior review, it was recommended that these changes should wait until a significant version change number (like from 0.4...
to 0.5
). https://github.com/chronotope/chrono/issues/660#issuecomment-1224052050 )
Also see initial change proposal explanation at https://github.com/chronotope/chrono/issues/660#issuecomment-1221812907
This PR is a further draft of #786 .
@djc @esheppa any opinion on this PR?
@jtmoon79 both @esheppa and I are working on maintaining this project as volunteers, so when you start pinging us after 1 day it feels entitled. We will review this when we have time, usually that doesn't take more than a few days.
@jtmoon79 both @esheppa and I are working on maintaining this project as volunteers, so when you start pinging us after 1 day it feels entitled. We will review this when we have time, usually that doesn't take more than a few days.
Pardon me. Yes, per your convenience.
Okay, this is way too big to review in one go. It can be all in one PR but then I'd like the big commit split up in more, smaller commits. Multiple PRs would also be a fine approach. Here's what I'm thinking:
- One commit to add test cases that verify the current behavior (that will still pass after your changes)
- One commit to make the actual code changes and relevant document changes, and changes any tests as needed
- One commit to add further tests that verify the new behavior
Some further notes: please remove type annotations unless necessary for the compiler, don't use variable names like s_c
(although c
is fine if it is for a char
) -- it seemed like you were using _c
as a sort of Hungarian notation? Also for tests that test invalid input, please add a comment saying why the particular line of input is invalid.
Okay, this is way too big to review in one go. It can be all in one PR but then I'd like the big commit split up in more, smaller commits.
This PR push is slightly different than your ask. It's currently 2 commits. I can revise into 3 commits if needed. Hopefully my forthcoming rationale is agreeable. 🙂
I'd like the big commit split up in more, smaller commits.
I made two large commits for easier comparison:
- first commit that adds many test cases and no behavior changes aaf4c0a0be979b63586b530cf1e259376a818a38
- second commit that adds behavior changes, tweaks test cases, adds few test cases 72d1d83258780c738b1c1a69ce5da27065db107f
Having this in two commits makes "before/after behavior comparison" easier for humans to grok, IMO.
I dropped PR creep around function name changes and a few other things.
One commit to add test cases that verify the current behavior (that will still pass after your changes)
Commit aaf4c0a0be979b63586b530cf1e259376a818a38
This adds a substantial number of tests. Some may seem redundant. But they are to help clarify behavior changes that will occur in the next commit.
Since there were significant changes to parse_internal
in the next commit, I also added tests for other Item
s like, for example, Nanosecond3NoDot
. e.g. for testing of Nanosecond3NoDot
make sure test number lengths of 2 and 4 ("edges") and 6, 9 (other signifcant fraction lengths).
These seemingly unrelated test cases gave me a little more confidence that upcoming changes did not affect adjacent parsing.
One commit to make the actual code changes and relevant document changes, and changes any tests as needed
Commit 72d1d83258780c738b1c1a69ce5da27065db107f
This introduces major parsing changes.
One notable change: I reduced timezone colon delimiters from arbitrary unlimited whitespace+colon to one or zero of
patterns ":"
, " "
, " :"
, ": "
, or " : "
., e.g. "0430"
, "04:30"
, "04 30"
, "04 :30"
, "04: 30"
, or "04 : 30"
This seemed like a reasonable compromise of previous matching flexibility and desired matching exactness.
Some tests that appear added or removed might be just moved,
e.g. the test case " 4 : 3 : 2.1 "
here:
--- a/src/naive/time/tests.rs
+++ b/src/naive/time/tests.rs
@@ -196,9 +196,6 @@ fn test_date_from_str() {
"0:0:0",
"0:0:0.0000000",
"0:0:0.0000003",
- " 4 : 3 : 2.1 ",
- " 09:08:07 ",
- " 9:8:07 ",
"01:02:03",
"4:3:2.1",
"9:8:7",
@@ -266,6 +263,18 @@ fn test_date_from_str() {
"1441497364.649", // valid datetime, not a NaiveTime
"+1441497364.649", // valid datetime, not a NaiveTime
"+1441497364", // valid datetime, not a NaiveTime
+ "01 :02:03", // space after hour
+ "01: 02:03", // space before minute
+ "01 : 02:03", // space around hour-minute divider
+ "01:02 :03", // space after minute
+ "01:02: 03", // space before second
+ "01:02 : 03", // space around minute-second divider
+ "01:02:03 .456", // space after second
+ "01:02:03. 456", // space before fraction
+ "01:02:03 ", // trailing space
+ "01:02:03.456 ", // trailing space
+ " 01:02:03", // leading space
+ " 4 : 3 : 2.1 ", // spaces intermixed throughout
One commit to add further tests that verify the new behavior
I thought this to be unnecessary for my "A/B comparison" approach.
please remove type annotations unless necessary for the compiler
Removed.
don't use variable names like s_c (although c is fine if it is for a char) -- it seemed like you were using _c as a sort of Hungarian notation?\
Renamed unused variable _c
to _
.
There are some places using c_
(a char
). The c_
is to help humans distinguish from outer scope variable c
(a char
).
Also for tests that test invalid input, please add a comment saying why the particular line of input is invalid.
I added comment explanations for many invalid input tests. However, I did not add a comment for all invalid input tests. I was concerned that in some areas those comments would decrease grokability. Happy to add more comments to the test cases if requested.
I ran cargo fmt
at the project top-level directory.
This produced some unexpected alignments of trailing comments, e.g.
+ "2012 -12-12T12:12:12Z", // space after year
+ "2012 -12-12T12:12:12Z", // multi space after year
+ "2012- 12-12T12:12:12Z", // space after year divider
+ "2012- 12-12T12:12:12Z", // multi space after year divider
+ "2012-12-12T 12:12:12Z", // space after date-time divider
At first glance, this looks like a bug in rustfmt
.
Regarding new change 3b87f82, see failed job rust_msrv (ubuntu-latest) https://github.com/chronotope/chrono/actions/runs/3278766675/jobs/5397533682
error[E0277]: the trait bound `&[&str; 37]: std::iter::IntoIterator` is not satisfied
--> src/datetime/tests.rs:499:15
|
499 | for &s in &invalid {
| ^^^^^^^^ the trait `std::iter::IntoIterator` is not implemented for `&[&str; 37]`
|
= help: the following implementations were found:
<&'a [T; _] as std::iter::IntoIterator>
<&'a [T] as std::iter::IntoIterator>
<&'a mut [T; _] as std::iter::IntoIterator>
<&'a mut [T] as std::iter::IntoIterator>
= note: required by `std::iter::IntoIterator::into_iter`
Latest PR push also includes a few extra tests in test_rfc3339
that were folded into a prior commit. Hence the force push
https://github.com/chronotope/chrono/pull/807/files#diff-abd7a03adfeaa1c9ca4623371f789cf48c5d794d82457a4068ad5be00fe7c784R1458-R1492
FYI the last PR update was a rebase to main
.
Unfortunately, command cargo test
now prints several warnings regarding deprecated functions, e.g.
warning: use of deprecated associated function `offset::TimeZone::ymd_opt`: use `with_ymd_and_hms()` instead
--> src/datetime/tests.rs:184:13
|
184 | edt.ymd_opt(2015, 2, 18)
| ^^^^^^^
|
= note: `#[warn(deprecated)]` on by default
If you'd like me to fix these then let me know.
Yes, please clean up any new uses of deprecated methods -- sorry about the churn here!
@esheppa don't mean to pester, just wondering if you saw this updated PR? I know you both maintain this project on your own time, so whenever.
@esheppa ping, would you mind doing another round of review on this?
Will get this reviewed tomorrow
Thanks for your patience @jtmoon79. This is looking great. I think it makes sense to have any further changes done in subsequent PR's rather than this one, so I'm happy for this to go ahead and be merged. It looks like this will need a rebase before being able to be merged
I think this is ready to merge.
I'm a little concerned about my decision to allow some flexibility in the timezone offset matching, e.g. patterns 00:00
, 00 :00
, 00: 00
, 00 : 00
match (https://github.com/chronotope/chrono/commit/5bd127812226a51f3430b982cee54e8bb6eecfd2). Perhaps adding this little corner of wiggle room contradicts the purpose of this large change, and so could be a little confusing. I can rework this PR to simply be strict about the pattern, e.g. 00:00
only.
On the other hand, maybe just getting this PR merged would be better than putting off this big change any longer?
On a different topic, would you like help reviewing Issues and PRs for this project? It's an important project and it seems like some assistance would help. Since I have some familiarity now, I might be of some use. I could check-in on occasion and give (non-authoritative) feedback to PRs and Issues.
I agree it probably makes sense to leave the time whitespace as is for this PR, maybe revisit that in another one?
Would it be feasible to squash the current commit history down to something more or less equivalent to the original three commits, or some other sequence that doesn't have as much back and forth?
Your help with maintaining this project would be much appreciated!
Would it be feasible to squash the current commit history down
Squashed.
Your help with maintaining this project would be much appreciated!
I'll subscribe to the github alerts. I'll comment on PRs where I can.