dateparser
dateparser copied to clipboard
Parse unix timestamps consistently regardless of timezones
When dateparser
parses a unix timestamp, and the user has set a TIMEZONE
that is not local
(or the same as the user's system local timezone), it gets the wrong time. This is because datetime.fromtimestamp(timestamp)
will return a datetime
in the user's local timezone as configured by their system (influenced by the TZ
env var, /etc/localtime
, etc). But, it will be a naive datetime
without any timezone information . The dateparser
code will try to interpret that datetime
object as if it was in the settings.TIMEZONE
timezone, which is incorrect.
The fix is to use datetime.fromtimestamp(timestamp, timezone_that_matches_settings_TIMEZONE)
to get a timezone aware datetime
object with a timezone that matches dateparser's settings.TIMEZONE
(or use the local timezone if that is unset). Then strip the tzinfo
from that datetime
object, because apply_timezone_from_settings(date_obj, settings)
requires date_obj
to be a naive datetime
without timezone information. So when apply_timezone_from_settings
interprets that object, it does so correctly.
Codecov Report
Merging #954 (6934648) into master (0ed979e) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #954 +/- ##
=======================================
Coverage 98.29% 98.29%
=======================================
Files 234 234
Lines 2702 2705 +3
=======================================
+ Hits 2656 2659 +3
Misses 46 46
Impacted Files | Coverage Δ | |
---|---|---|
dateparser/date.py | 99.27% <100.00%> (+<0.01%) |
:arrow_up: |
dateparser/utils/__init__.py | 95.27% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
By the way, when I was adding tests, I tried to follow the pattern of the other tests in TestTimestampParser
by calling date.get_date_from_timestamp
directly, and passing in a custom value for settings
. I also saw a pattern for creating a new Settings
object, in another test in tests/test_date.py
of doing something like:
settings_mod = copy(settings)
settings_mod.RETURN_AS_TIMEZONE_AWARE = True
settings_mod.TIMEZONE = timezone
settings_mod.TO_TIMEZONE = to_timezone
date.get_date_from_timestamp(timestamp, settings_mod)
But, when I do that, it seems like I end up modifying the global settings
object imported from dateparser.conf
and that caused other tests to fail because they weren't expecting RETURN_AS_TIMEZONE_AWARE
to be True
. Is dateparser.conf.Settings
a singleton? Should the other test that tries to do a copy
be fixed? Should there be a more standard way to create a custom/one-off settings object for use in tests?
@Gallaecio Is this good to merge?
TBH, I am not able to understand the issue here. Maybe an example may help.
But if we are applying the timezone and stripping it off, to eventually apply the timezone from settings through apply_timezone_from_settings
, how does the behavior change?
Also the statement in the initial explanation
The
dateparser
code will try to interpret thatdatetime
object as if it was in thesettings.TIMEZONE
timezone, which is incorrect
Where do we see that behavior, since the apply_tiemzone_from_settings
would apply the timezone from the settings, I believe?
The examples in the added tests should show the issue. You could try running those tests on the main line code without my changes.
But if we are applying the timezone and stripping it off, to eventually apply the timezone from settings through apply_timezone_from_settings , how does the behavior change?
The problem is that there's not just one time zone that's being stripped and reapplied. The date is initially constructed with the time zone of the system (regardless of any dateparser settings), but then that time zone is not kept on the datetime. Then later in the code, the datetime is interpreted as if it was in dateparser's settings.TIMEZONE. If the system's time zone happens to be the same as the one used for dateparser's settings, then I think everything is fine. But if my system is set to 'America/New_York', and I set the dateparser settings time zone to 'UTC' and ask it to parse '1661996156', I believe dateparser will currently give me 'August 31, 2022 9:35:56 PM UTC' when it should give me 'September 1, 2022 1:35:56 AM UTC'.
Sorry if there are any errors in the above comment, I'm on mobile and it's also been a while since I first made this branch. Please test it the examples and see if what I'm saying makes sense.
Where do we see that behavior, since the apply_tiemzone_from_settings would apply the timezone from the settings, I believe?
Yes, but with the new code, the naive datetime object is created in the same time zone as settings.TIMEZONE. so that later, when it's interpreted as if it was in settings.TIMEZONE, that's now a correct assumption. With the old code, the naive datetime is created relative to the system's time zone.
Here's an example. With the current code:
>>> dateparser.parse('1661996156', settings={'TIMEZONE': 'UTC', 'TO_TIMEZONE': 'UTC', 'RETURN_AS_TIMEZONE_AWARE': True}).timestamp()
1661981756.0
The returned unix timestamp doesn't match the original timestamp. And with my branch:
>>> dateparser.parse('1661996156', settings={'TIMEZONE': 'UTC', 'TO_TIMEZONE': 'UTC', 'RETURN_AS_TIMEZONE_AWARE': True}).timestamp()
1661996156.0
It does match.
Thanks @onlynone!
It's clear now. I understand where the difference comes from.
As far as I understood, previous to, when the date goes to the apply_timezone_from_settings
, it got converted into a date object with a local timezone(without timezone included), and that date is further used for applying timezone from the mentioned method.
@gutsytechster @Gallaecio I think everything has been resolved, can it be merged?
Thank you!
Using an old version of tzlocal
, 2.1 made dateparser
return Local Mean Time instead of the correct timezone as the result of this change. After they switched away from pytz
to zoneinfo
that problem went away. You might want to bump your minimum tzlocal
version dependency.