Unneeded patterns/rules influence the result of the parsing
This might be an issue with retree rather than with the dateparser though.
The following test (which you cannot execute via the public API) fails:
@Test
public void parserWithLimitedPatterns(){
List<String> rules = Arrays.asList(
"(?<year>\\d{4})\\W{1}(?<month>\\d{1,2})\\W{1}(?<day>\\d{1,2})[^\\d]?",
"\\W*(?:at )?(?<hour>\\d{1,2}):(?<minute>\\d{1,2})(?::(?<second>\\d{1,2}))?(?:[.,](?<ns>\\d{1,9}))?(?<zero>z)?",
" ?(?<zoneOffset>[-+]\\d{1,2}:?(?:\\d{2})?)"
);
DateParser dateParser = new DateParser(rules, new HashSet<>(rules), Collections.emptyMap(), true, false);
String input = "2022-08-09 19:04:31.600000+00:00";
Date date = dateParser.parseDate(input);
assertEquals(parser.parseDate(input), date);
}
Note how those 3 rules should be sufficient to parse the date.
- There is a rule for the year-month-day part
- There is a rule for the hours:minutes:seconds.ns part
- There is a rule for the zone offset part
However, during parsing the zoneoffset rule is never used. Instead, it uses the rule for the hours twice.
The weird thing is that when I add a rule that should not be used (`" ?(?
@Test
public void parserWithLimitedPatterns(){
List<String> rules = Arrays.asList(
"(?<year>\\d{4})\\W{1}(?<month>\\d{1,2})\\W{1}(?<day>\\d{1,2})[^\\d]?",
" ?(?<year>\\\\d{4})$",
"\\W*(?:at )?(?<hour>\\d{1,2}):(?<minute>\\d{1,2})(?::(?<second>\\d{1,2}))?(?:[.,](?<ns>\\d{1,9}))?(?<zero>z)?",
" ?(?<zoneOffset>[-+]\\d{1,2}:?(?:\\d{2})?)"
);
DateParser dateParser = new DateParser(rules, new HashSet<>(rules), Collections.emptyMap(), true, false);
String input = "2022-08-09 19:04:31.600000+00:00";
Date date = dateParser.parseDate(input);
assertEquals(parser.parseDate(input), date);
}
The position where I add that additional rule is important. For example adding it at the end of the list instead of at index 1 makes the test fail again.
I bumped into this issue for PR https://github.com/sisyphsu/dateparser/pull/28 , where I try to reduce the number of rules that are used for parsing to improve the performance.