chrono icon indicating copy to clipboard operation
chrono copied to clipboard

Issue #660 whitespace exact

Open jtmoon79 opened this issue 2 years ago • 5 comments

be exact about whitespace

Be exact about whitespace in parsing. This changes pattern matching in format::parse::parse as it does not allow arbitrary whitespace before, after, or between the datetime specifiers.

format/parse.rs:datetime_from_str is exact about whitespace in the passed data s and passed strftime format fmt.

Also be more exacting about colons and whitespace around timezones. Instead of unlimited colons and whitespace, only match a more limited possible set of leading colons and whitespace. (See fn colon_or_space https://github.com/chronotope/chrono/pull/807/files#diff-679cb1eea69fd36c51536ec228e61d6b0e40ade95997877f6f853e6ad31de5f1R224-R227).

Also rename some test functions to better describe what is tested.


~To help reviewers understand how test results changed before and after, I created a branch Issue660_whitespace_exact_try6_prechange_testsonly that has added same set of test cases as this branch Issue660_whitespace_exact_try6, but does not have the underlying changes for whitespace and colons (i.e. Issue660_whitespace_exact_try6_prechange_testsonly is the "before significant changes" branch, and Issue660_whitespace_exact_try6 is the "after significant changes" branch)~ https://github.com/jtmoon79/chrono/compare/Issue660_whitespace_exact_try6_prechange_testsonly..Issue660_whitespace_exact_try6 ~The tests that result in different values are marked with // TESTDIFF.~


This is a lot of changes in this PR. IMO, it covers one idea so it suitable for one PR; colons and whitespace should be handled more exactly. The changes for both of these tended to be the same code lines, so I thought it was sensible to make the changes all in one commit. Hopefully the additional test cases provide enough confidence for accepting this significant change.


Thanks for contributing to chrono!

  • [ ] Have you added yourself and the change to the changelog? (Don't worry about adding the PR number)
  • [X] If this pull request fixes a bug, does it add a test that verifies that we can't reintroduce it?

I did not add to the CHANGELOG.md because there was no section 0.5 to add a new line item. I can add such a section if the reviewers would like. (In a prior review, it was recommended that these changes should wait until a significant version change number (like from 0.4... to 0.5). https://github.com/chronotope/chrono/issues/660#issuecomment-1224052050 )


Also see initial change proposal explanation at https://github.com/chronotope/chrono/issues/660#issuecomment-1221812907

This PR is a further draft of #786 .

jtmoon79 avatar Aug 31 '22 21:08 jtmoon79

@djc @esheppa any opinion on this PR?

jtmoon79 avatar Sep 02 '22 00:09 jtmoon79

@jtmoon79 both @esheppa and I are working on maintaining this project as volunteers, so when you start pinging us after 1 day it feels entitled. We will review this when we have time, usually that doesn't take more than a few days.

djc avatar Sep 02 '22 08:09 djc

@jtmoon79 both @esheppa and I are working on maintaining this project as volunteers, so when you start pinging us after 1 day it feels entitled. We will review this when we have time, usually that doesn't take more than a few days.

Pardon me. Yes, per your convenience.

jtmoon79 avatar Sep 02 '22 20:09 jtmoon79

Okay, this is way too big to review in one go. It can be all in one PR but then I'd like the big commit split up in more, smaller commits. Multiple PRs would also be a fine approach. Here's what I'm thinking:

  • One commit to add test cases that verify the current behavior (that will still pass after your changes)
  • One commit to make the actual code changes and relevant document changes, and changes any tests as needed
  • One commit to add further tests that verify the new behavior

Some further notes: please remove type annotations unless necessary for the compiler, don't use variable names like s_c (although c is fine if it is for a char) -- it seemed like you were using _c as a sort of Hungarian notation? Also for tests that test invalid input, please add a comment saying why the particular line of input is invalid.

djc avatar Sep 05 '22 08:09 djc

Okay, this is way too big to review in one go. It can be all in one PR but then I'd like the big commit split up in more, smaller commits.

This PR push is slightly different than your ask. It's currently 2 commits. I can revise into 3 commits if needed. Hopefully my forthcoming rationale is agreeable. 🙂

I'd like the big commit split up in more, smaller commits.

I made two large commits for easier comparison:

  • first commit that adds many test cases and no behavior changes aaf4c0a0be979b63586b530cf1e259376a818a38
  • second commit that adds behavior changes, tweaks test cases, adds few test cases 72d1d83258780c738b1c1a69ce5da27065db107f

Having this in two commits makes "before/after behavior comparison" easier for humans to grok, IMO.


I dropped PR creep around function name changes and a few other things.


One commit to add test cases that verify the current behavior (that will still pass after your changes)

Commit aaf4c0a0be979b63586b530cf1e259376a818a38

This adds a substantial number of tests. Some may seem redundant. But they are to help clarify behavior changes that will occur in the next commit.

Since there were significant changes to parse_internal in the next commit, I also added tests for other Items like, for example, Nanosecond3NoDot. e.g. for testing of Nanosecond3NoDot make sure test number lengths of 2 and 4 ("edges") and 6, 9 (other signifcant fraction lengths). These seemingly unrelated test cases gave me a little more confidence that upcoming changes did not affect adjacent parsing.


One commit to make the actual code changes and relevant document changes, and changes any tests as needed

Commit 72d1d83258780c738b1c1a69ce5da27065db107f

This introduces major parsing changes.

One notable change: I reduced timezone colon delimiters from arbitrary unlimited whitespace+colon to one or zero of patterns ":", " ", " :", ": ", or " : "., e.g. "0430", "04:30", "04 30", "04 :30", "04: 30", or "04 : 30" This seemed like a reasonable compromise of previous matching flexibility and desired matching exactness.

Some tests that appear added or removed might be just moved, e.g. the test case " 4 : 3 : 2.1 " here:

--- a/src/naive/time/tests.rs
+++ b/src/naive/time/tests.rs
@@ -196,9 +196,6 @@ fn test_date_from_str() {
         "0:0:0",
         "0:0:0.0000000",
         "0:0:0.0000003",
-        " 4 : 3 : 2.1 ",
-        " 09:08:07 ",
-        " 9:8:07 ",
         "01:02:03",
         "4:3:2.1",
         "9:8:7",
@@ -266,6 +263,18 @@ fn test_date_from_str() {
         "1441497364.649",    // valid datetime, not a NaiveTime
         "+1441497364.649",   // valid datetime, not a NaiveTime
         "+1441497364",       // valid datetime, not a NaiveTime
+        "01 :02:03",         // space after hour
+        "01: 02:03",         // space before minute
+        "01 : 02:03",        // space around hour-minute divider
+        "01:02 :03",         // space after minute
+        "01:02: 03",         // space before second
+        "01:02 : 03",        // space around minute-second divider
+        "01:02:03 .456",     // space after second
+        "01:02:03. 456",     // space before fraction
+        "01:02:03 ",         // trailing space
+        "01:02:03.456 ",     // trailing space
+        " 01:02:03",         // leading space
+        " 4 : 3 : 2.1 ",     // spaces intermixed throughout

One commit to add further tests that verify the new behavior

I thought this to be unnecessary for my "A/B comparison" approach.


please remove type annotations unless necessary for the compiler

Removed.


don't use variable names like s_c (although c is fine if it is for a char) -- it seemed like you were using _c as a sort of Hungarian notation?\

Renamed unused variable _c to _.

There are some places using c_ (a char). The c_ is to help humans distinguish from outer scope variable c (a char).


Also for tests that test invalid input, please add a comment saying why the particular line of input is invalid.

I added comment explanations for many invalid input tests. However, I did not add a comment for all invalid input tests. I was concerned that in some areas those comments would decrease grokability. Happy to add more comments to the test cases if requested.


I ran cargo fmt at the project top-level directory. This produced some unexpected alignments of trailing comments, e.g.

+        "2012 -12-12T12:12:12Z",                                     // space after year
+        "2012  -12-12T12:12:12Z",                                    // multi space after year
+        "2012- 12-12T12:12:12Z",                                     // space after year divider
+        "2012-  12-12T12:12:12Z", // multi space after year divider
+        "2012-12-12T 12:12:12Z",  // space after date-time divider

At first glance, this looks like a bug in rustfmt.

jtmoon79 avatar Sep 10 '22 08:09 jtmoon79

Regarding new change 3b87f82, see failed job rust_msrv (ubuntu-latest) https://github.com/chronotope/chrono/actions/runs/3278766675/jobs/5397533682

error[E0277]: the trait bound `&[&str; 37]: std::iter::IntoIterator` is not satisfied
   --> src/datetime/tests.rs:499:15
    |
499 |     for &s in &invalid {
    |               ^^^^^^^^ the trait `std::iter::IntoIterator` is not implemented for `&[&str; 37]`
    |
    = help: the following implementations were found:
              <&'a [T; _] as std::iter::IntoIterator>
              <&'a [T] as std::iter::IntoIterator>
              <&'a mut [T; _] as std::iter::IntoIterator>
              <&'a mut [T] as std::iter::IntoIterator>
    = note: required by `std::iter::IntoIterator::into_iter`

jtmoon79 avatar Oct 19 '22 05:10 jtmoon79

Latest PR push also includes a few extra tests in test_rfc3339 that were folded into a prior commit. Hence the force push https://github.com/chronotope/chrono/pull/807/files#diff-abd7a03adfeaa1c9ca4623371f789cf48c5d794d82457a4068ad5be00fe7c784R1458-R1492

jtmoon79 avatar Nov 26 '22 22:11 jtmoon79

FYI the last PR update was a rebase to main.

Unfortunately, command cargo test now prints several warnings regarding deprecated functions, e.g.

warning: use of deprecated associated function `offset::TimeZone::ymd_opt`: use `with_ymd_and_hms()` instead
   --> src/datetime/tests.rs:184:13
    |
184 |         edt.ymd_opt(2015, 2, 18)
    |             ^^^^^^^
    |
    = note: `#[warn(deprecated)]` on by default

If you'd like me to fix these then let me know.

jtmoon79 avatar Nov 27 '22 20:11 jtmoon79

Yes, please clean up any new uses of deprecated methods -- sorry about the churn here!

djc avatar Nov 28 '22 09:11 djc

@esheppa don't mean to pester, just wondering if you saw this updated PR? I know you both maintain this project on your own time, so whenever.

jtmoon79 avatar Dec 26 '22 17:12 jtmoon79

@esheppa ping, would you mind doing another round of review on this?

djc avatar Mar 03 '23 12:03 djc

Will get this reviewed tomorrow

esheppa avatar Mar 04 '23 11:03 esheppa

Thanks for your patience @jtmoon79. This is looking great. I think it makes sense to have any further changes done in subsequent PR's rather than this one, so I'm happy for this to go ahead and be merged. It looks like this will need a rebase before being able to be merged

esheppa avatar Mar 13 '23 00:03 esheppa

I think this is ready to merge.

I'm a little concerned about my decision to allow some flexibility in the timezone offset matching, e.g. patterns 00:00, 00 :00, 00: 00, 00 : 00 match (https://github.com/chronotope/chrono/commit/5bd127812226a51f3430b982cee54e8bb6eecfd2). Perhaps adding this little corner of wiggle room contradicts the purpose of this large change, and so could be a little confusing. I can rework this PR to simply be strict about the pattern, e.g. 00:00 only.

On the other hand, maybe just getting this PR merged would be better than putting off this big change any longer?

jtmoon79 avatar Mar 14 '23 22:03 jtmoon79

On a different topic, would you like help reviewing Issues and PRs for this project? It's an important project and it seems like some assistance would help. Since I have some familiarity now, I might be of some use. I could check-in on occasion and give (non-authoritative) feedback to PRs and Issues.

jtmoon79 avatar Mar 14 '23 22:03 jtmoon79

I agree it probably makes sense to leave the time whitespace as is for this PR, maybe revisit that in another one?

Would it be feasible to squash the current commit history down to something more or less equivalent to the original three commits, or some other sequence that doesn't have as much back and forth?

Your help with maintaining this project would be much appreciated!

djc avatar Mar 17 '23 12:03 djc

Would it be feasible to squash the current commit history down

Squashed.

Your help with maintaining this project would be much appreciated!

I'll subscribe to the github alerts. I'll comment on PRs where I can.

jtmoon79 avatar Mar 17 '23 22:03 jtmoon79