dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Improve performance when parsing many strings in the same format

Open robin-xyzt-ai opened this issue 2 years ago • 2 comments

Proposal for https://github.com/sisyphsu/dateparser/issues/17 .

By keeping track of which rules were used to parse the first string, parsing the next strings can try to use a matcher that only uses a subset of those rules.

The case in the benchmark is between 2 and 3 times faster on my machine:

Benchmark                                                          Mode  Cnt     Score     Error  Units
OptimizeForReuseSimilarFormattedBenchmark.optimizedForReuseParser  avgt    6   462.362 ±  54.300  ms/op
OptimizeForReuseSimilarFormattedBenchmark.regularParser            avgt    6  1130.171 ± 162.117  ms/op

robin-xyzt-ai avatar Jan 11 '23 14:01 robin-xyzt-ai

I tried to make the code a bit more clear by leaving some additional comments and doing a bit more code cleanup.

Let me know if there are specific parts that are still unclear.

robin-xyzt-ai avatar Feb 17 '23 15:02 robin-xyzt-ai

Looks like this PR isn't ready to be merged. The following test fails:

    @Test
    void foo() {
        DateParser parser = DateParser.newBuilder().optimizeForReuseSimilarFormatted(true).build();
        String inputString = "2022-08-09 19:04:31.600000+00:00";
        assertEquals(parser.parseDate(inputString), parser.parseDate(inputString));
    }

I'm afraid it will require a fix for https://github.com/sisyphsu/dateparser/issues/29 first.

robin-xyzt-ai avatar Feb 17 '23 17:02 robin-xyzt-ai