dateparser
dateparser copied to clipboard
Improve performance when parsing many strings in the same format
Proposal for https://github.com/sisyphsu/dateparser/issues/17 .
By keeping track of which rules were used to parse the first string, parsing the next strings can try to use a matcher that only uses a subset of those rules.
The case in the benchmark is between 2 and 3 times faster on my machine:
Benchmark Mode Cnt Score Error Units
OptimizeForReuseSimilarFormattedBenchmark.optimizedForReuseParser avgt 6 462.362 ± 54.300 ms/op
OptimizeForReuseSimilarFormattedBenchmark.regularParser avgt 6 1130.171 ± 162.117 ms/op
I tried to make the code a bit more clear by leaving some additional comments and doing a bit more code cleanup.
Let me know if there are specific parts that are still unclear.
Looks like this PR isn't ready to be merged. The following test fails:
@Test
void foo() {
DateParser parser = DateParser.newBuilder().optimizeForReuseSimilarFormatted(true).build();
String inputString = "2022-08-09 19:04:31.600000+00:00";
assertEquals(parser.parseDate(inputString), parser.parseDate(inputString));
}
I'm afraid it will require a fix for https://github.com/sisyphsu/dateparser/issues/29 first.