dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Unable to parse date with offset [Bug]

Open h8hawk opened this issue 2 years ago • 2 comments

Parser has failed to parse 2021-09-22 12:30-1030 . with valid format %Y-%m-%d %H:%M%z It raise following exception:

ValueError: Not naive datetime (tzinfo is already set)

h8hawk avatar Jul 31 '21 06:07 h8hawk

dateparser.parse("Wed Nov 21 14:48:56 +0800 2018",date_formats=['%a %b %d %H:%M:%S %z %Y'])
ValueError: Not naive datetime (tzinfo is already set)

vvanglro avatar Nov 12 '21 03:11 vvanglro

Hello! I'm working on parsing dates and came across an issue with parsing with UTC offsets.

Bottom-line, up-front: I don't think dateparser.parse() is handling %z accurately and some sort of fallback option in the parser is letting it recover, but with incorrect results. datetime.strptime() has no such recovery methods so it shows the errors during processing where dateparser.parse() does not. I'm hoping someone can see my attempts at debugging this and know the system well enough to see where in the stack the issue lies. I also test @h8hawk and @vvanglro input strings to see if their stuff works.

NOTE 1: I found Python 3.12 added %:z. You'll see my screenshots below trying %:z in the strptime() function (parsing). Reviewing the documentation and after some other testing in python (not shown), I now know that %:z is an output only format character for strftime() as seen in the yellow boxed error in one of the screenshots below. What originally got me on the path of %:z as an input format is it worked for me in dateparser.parse(). Thus, all green boxed formats in the screenshots should return an error, except it turns out these formats do not return an error when applied to my specific string using dateparser.parse() function only. I will come back to this.

NOTE 2: You all are brave souls, and I'm super appreciative, for taking on a project dealing with dates and times. https://xkcd.com/2867/

First, environment stuff:

  • MacBook Pro, 15-inch, 2018, Intel processor
  • macOS 14.2.1
  • pyenv setup, running in the Terminal: Python 3.12.1 (main, Jan 5 2024, 22:53:21) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
  • dateparser version 1.2.0
python in Terminal version dateparser version 1.2.0

The screenshot below shows my initial work-through of the issue when I couldn't get the result I expected from dateparser.parse(). The white arrow is an example of the datetime string I want to parse. Green boxes show string formats with %:z and blue boxes show string formats with %z. Purple arrows show correct results and the red arrows show the incorrect results (for one of the red arrows, my computer's local time setting is currently +0200 offset or AKA 'Europe/Athens', which is incorrect). Interestingly, you'll see that my string parses when both %z and %:z are left out of the format string (top of screenshot, not boxed).

I used Try/Except blocks because that's what I found in parse_with_formats of the dateparser code. I think the parse_with_formats code, the loop keeps going after the continue. I was thinking if I fed in a specific date_formats argument, that list would be the only thing that is tried. Being I fed in a list of one string format, I didn't think there would be any continue, the loop would exit, and the result would be None. However, since the %:s doesn't work in datetime.strptime() function, raising a ValueError, then the parse_with_formats must continue on a list of formats that I don't see because it returns with a good result. I get the feeling it returns with a good result seen in the format string without any %z or %:z (top of screenshot, not boxed).

My code attempting to isolate the issue

In this next screenshot, I attempted to see if the strings from @h8hawk and @vvanglro parsed like mine did. Again, blue boxes for %z, green boxes for %:z, purple arrows for correct results, and red arrows for incorrect results. I think what we're seeing here is my offset is calculated due to further format strings being tested that do match after the one I specifically send as an input fails. I tested @h8hawk and @vvanglro strings in multiple variations (some not shown) and these also do not return correct results (or no results at all).

Code attempting to recreate my issue with @h8hawk and @vvanglro inputs

I thought maybe some of the input settings were helping parse my string and not strings of @h8hawk and @vvanglro. In this final screenshot, we can see this is not the case: they still don't parse.

Applying settings I used to strings from @h8hawk and @vvanglro to see if it works differently

I hope this was helpful--I cannot go much further with my current knowledge.

mrmattson avatar Jan 06 '24 15:01 mrmattson