dateparse
dateparse copied to clipboard
Comprehensive validation 🔎, 30+ fixes integrated/added 🔨🐛, optimized performance 🚀
This package is amazing and hugely popular, and has been the best package for automatic date parsing in go for years! ⭐
Thanks @araddon for crafting this package with love over the years!!
I've been using this while developing a new cloud-based log aggregation/search/visualization product, and I've found that there are three major opportunities for improvement for my particular use case:
- The package does not strictly validate its input, leading to many false positives. This is OK if you know for sure the input matches one of the known formats, but cannot be trusted if the input could be anything and you only want a returned date/time if it definitely matches a known format.
- While still being far more efficient than the "shotgun" parsing approach, the package currently allocates a relatively large amount of memory (several times the average input size), which can add up when parsing megabytes of date strings per second in a high-throughput microservice. It can also be relatively slow when parsing a string that doesn't match a known format and can allocate even more memory in this case, due to custom error messages that include contextual details.
- There are a lot of unmerged one-off contributions that haven't been merged and need to be made coherent with each other.
This PR addresses all 3 opportunities:
- Validation: it now comprehensively validates the input in the following ways: at each point in the state machine, it makes sure there are cases for all possibilities, and any invalid possibility will fail; additionally, it makes sure that the entire format string was specifically set (excluding any trailing punctuation, which can be safely ignored). False positives should be extremely rare now (hard to prove they don't exist).
-
Memory Efficiency: bytes allocated have been reduced by 90%, and parsing some formats are zero allocation. The
SimpleErrorMessages
option was added (off by default) that greatly speeds up the case where a string does not match a known format -- with the option on, this case is now 4x faster and produces almost no allocations. - Merge & Integrate Community Fixes: Fixes for nearly all of the pending issues from the community (including open pull requests) have been incorporated and adapted.
In the process of going through the state machine comprehensively for validation, redundant code/states were merged, and support was added for certain edge cases (for example, some date formats did not support being followed by times).
The example and README.md were updated to incorporate all of the newly supported formats and edge cases. More details on how to properly interpret returned location information with respect to abbreviated timezones was added.
BREAKING -- the package now requires go >= 1.20 to support memory optimizations converting from []byte
to string
in key places.
A huge thanks to all who posted issues and contributed PRs -- while the PRs were unable to be merged directly because the validation changes were so major, the ideas of all these contributions and the associated test cases were incorporated. Here's credit for all of the issues fixes and contributions in this PR as well as a summary of additional fixes added:
- This PR started directly building from https://github.com/araddon/dateparse/pull/151 by @arran4 that fixes https://github.com/araddon/dateparse/issues/78 (support for GMT+offset, support additional whitespace in certain places, and other changes)
- Fix https://github.com/araddon/dateparse/issues/145 - words falsely detected as a valid format - fixed by comprehensive validation
- Fix https://github.com/araddon/dateparse/issues/108 -
"1.jpg"
incorrectly parsed as valid - fixed by comprehensive validation - Fix https://github.com/araddon/dateparse/issues/98 - names and addresses incorrectly parsed as valid - fixed by comprehensive validation
- Fix https://github.com/araddon/dateparse/issues/150 - fix parsing "am" as "pm"
- Fix https://github.com/araddon/dateparse/issues/157 - fix parsing of
Thu Jan 28 2021 15:28:21 GMT+0000
format - Fix https://github.com/araddon/dateparse/issues/137 - variant of ':' as separator for fraction seconds - adapted https://github.com/araddon/dateparse/pull/138 by @dferstay
- Fix https://github.com/araddon/dateparse/issues/139 - add support for dd-mm-yyyy (digit month) - adapt https://github.com/araddon/dateparse/pull/140 by @dferstay (also fixes https://github.com/araddon/dateparse/issues/155)
- Fix https://github.com/araddon/dateparse/issues/141 - add support for
yyyy mon dd
format - adapt https://github.com/araddon/dateparse/pull/142 by @dferstay - Fix https://github.com/araddon/dateparse/issues/143 - add support for
yyyymmddhhmmss.SSS
format - adapt https://github.com/araddon/dateparse/pull/144 by @dferstay - Fix https://github.com/araddon/dateparse/issues/130 - fix timezone parsing after fractional seconds - adapted https://github.com/araddon/dateparse/pull/131 by @zifengyu
- Fix https://github.com/araddon/dateparse/issues/123 - fix parsing of fractional seconds for certain formats (e.g.,
2017-04-03 22:32:14.322 CET
) - Fix https://github.com/araddon/dateparse/issues/109 - fix parsing of
Sun, 07 Jun 2020 00:00:00 +0100
format - Fix bug in comment https://github.com/araddon/dateparse/issues/100#issuecomment-1118868154 - fix parsing of
1 April 2022 23:59
format (time after certain date formats) - Fix https://github.com/araddon/dateparse/issues/129 - fix ambiguous and PreferMonthFirst parsing for format
mm.dd.yyyy (time)
- adapted https://github.com/araddon/dateparse/pull/133 by @mehanizm (also fixes https://github.com/araddon/dateparse/issues/91 and https://github.com/araddon/dateparse/issues/28) - Fix https://github.com/araddon/dateparse/issues/149 - support
PMDT
andAMT
time zones and validate that AM/PM indicators only appear at most once - Fix https://github.com/araddon/dateparse/issues/127 - add support for
dd[th,nd,st,rd] Month yyyy
format - adapt https://github.com/araddon/dateparse/pull/128 by @krhubert - Fix https://github.com/araddon/dateparse/issues/158 - fix parsing for format
(time) UTC[+-]NNNN
- Add support for
mm/dd/yyyy, hh:mm:ss
format - adapt https://github.com/araddon/dateparse/pull/156 by @BrianLeishman - Add support for
yyyy.mm.dd (time)
format - adapt https://github.com/araddon/dateparse/pull/134 by @jmdacruz, and add cases expected to fail toTestParseErrors
unit test - Add support for git log format (e.g.,
Thu Apr 7 15:13:13 2005 -0700
) - adapt commit https://github.com/araddon/dateparse/pull/92/commits/99d9682a1cbe7a14975b5b71704af8847c2684f9 from https://github.com/araddon/dateparse/pull/92 by @jiangxin (merge timeWsYearOffset case and add validation) - Add support for RabbitMQ log format (e.g.,
dd-mon-yyyy::hh:mm:ss
) - adapt https://github.com/araddon/dateparse/pull/122 by @bizy01 - Expand support for Chinese date formats, and allow times to follow - adapt https://github.com/araddon/dateparse/pull/132 by @xwjdsh
- Add support for
mon/dd/yyyy
format, e.g.,Oct/31/1970
- Add support for
dd-month-year
format - Extend format
yyyy-mon-dd
to allow times to follow it. Also allow full month name instead of just abbreviated. - Fix the case for ambiguous date/time in the
mm:dd:yyyy
format - Allow full day name before month (e.g.,
Monday January 4th, 2017
) - Additional fixes for
mm.dd.yyyy (time)
format - Fix ambiguous parsing for
mm/dd
formats that start with a weekday
Also adds tests to verify that the following stay fixed:
- https://github.com/araddon/dateparse/issues/94 - mysql log format such as
190910 11:51:49
- https://github.com/araddon/dateparse/issues/117 - fractional seconds after ':'
Great work @klondikedragon
this is great work @klondikedragon! Now, this repo hasn't seen much movement in years, do you think we should start using a fork? should we use yours?
I would vote that we use his, we should see if it qualifies for https://github.com/avelino/awesome-go
In some further testing, I found that weekday prefixes only worked for some date formats, but not for others. So that is fixed now. As a side effect (benefit?), leading whitespace is now allowed/ignored.
Let's see if @araddon has feedback and/or is interested in merging this PR (it's a pretty big change and changes the philosophy a bit to have validation, and also makes the code a little more complex in favor of performance). The changes are large enough now it could break backwards compatibility, so in the very least it should deserve a new major version IMO.
Although I don't want to fork something lightly, since we haven't heard any feedback from @araddon for a few years, it could definitely make sense to go ahead. If there is no comment after the holidays, I think it would make sense to go ahead and fork. This package is a key part of freeform date parsing in the "automatic structured field extraction" logic being built for IT Lightning (this new cloud-based log management platform I'm building). Given that's the case, the IT Lightning org would be willing to maintain the forked repo and work with community issues/contributions, since we're motivated to have best-in-class date recognition & parsing in the log ingestion pipeline. The license would remain the same of course.
The community contributions would help us improve our date parsing, we'd be motivated to put energy into it to keep our date parsing bug-free and comprehensive, and community use of the package might help us get a little exposure to devs/SREs who might become interested in our log management solution. So it should benefit everyone.
All feedback is welcome. What do ya'll think of this proposed plan?
great work @klondikedragon . How can i start using this?
I'll go ahead and fork this package. I'm renaming the main branch as part of that.
The fork is complete and published as v0.1.0
-- again, a huge thanks to @araddon for authoring and maintaining this package for so many years!
The fork is available using go get github.com/itlightning/dateparse
-- issues and PRs are welcome.
@elliot40404 @arran4 @jmdacruz -- see what you think and how this updated package works! If this looks good and after incorporating feedback, I think I'll publish a v1.0.0 at some point soon. I'm also curious to get feedback on my log management project too, check out the site/discord if you're interested. Thanks!