polars
polars copied to clipboard
Constructing a DateTime Series does not include timezone of values
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of polars.
Issue Description
The Series constructor does not pick up timezone information from the provided values.
Reproducible Example
import polars as pl
from datetime import datetime
import pytz
s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz="UTC") # Includes time zone info
s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))]) # Does not include time zone info
assert s1.series_equal(s2) # Fails
Expected Behavior
I would expect the Series constructor to detect that the provided values are time zone specific, and construct the Series appropriately.
Installed Versions
---Version info---
Polars: 0.14.8
Index type: UInt32
Platform: Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python: 3.10.6 (main, Aug 15 2022, 22:17:55) [GCC 11.2.0]
---Optional dependencies---
pyarrow: 9.0.0
pandas: 1.4.4
numpy: 1.23.2
fsspec:
Similarly, timezone info is lost like this:
import datetime
import polars as pl
sample = datetime.datetime(2022, 1, 1, 23, 23, tzinfo=datetime.timezone.utc)
sample
#> datetime.datetime(2022, 1, 1, 23, 23, tzinfo=datetime.timezone.utc)
pl.Series([sample])[0]
#> datetime.datetime(2022, 1, 1, 23, 23)
pl.__version__
#> '0.14.9'
This is currently by design as I am not really sure how to deal with that efficiently. Currently it is up to the caller to the set the timezone once a Series is constructed.
Thanks for the quick answer. You mean efficiency from a compute perspective? It's just a little bit unexpected if you have a timezone set on the input and that information is lost. Surely the person who sets it had an intention of preserving it? Alternatively, maybe issue a warning when a time zone is set on input?
assert s1.series_equal(s2) # Fails
I'm confused by the report - are you saying that it fails, or that you expect it to fail? I'm not seeing any failure:
In [10]: import polars as pl
...: import pytz
...:
...: s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz="UTC") # Includes time zone info
...: s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))]) # Does not include time zone info
...:
In [11]: s1
Out[11]:
shape: (1,)
Series: 'dt' [datetime[μs, UTC]]
[
2001-01-01 00:00:00 UTC
]
In [12]: s2
Out[12]:
shape: (1,)
Series: 'dt' [datetime[μs, UTC]]
[
2001-01-01 00:00:00 UTC
]
In [13]: assert s1.series_equal(s2)
It fails. Just verified, still raises an AssertionError.
Version info:
---Version info---
Polars: 0.15.14
Index type: UInt32
Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python: 3.11.0 (main, Nov 1 2022, 09:16:00) [GCC 11.2.0]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.5.2
numpy: 1.24.0
fsspec: 2022.11.0
connectorx: <not installed>
xlsx2csv: 0.8.1
matplotlib: <not installed>
🤔 how odd, it doesn't raise anything for me, and we have practically the same setup
(.311venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ cat t.py
import polars as pl
from datetime import datetime
import pytz
s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz="UTC") # Includes time zone info
s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))]) # Does not include time zone info
assert s1.series_equal(s2) # Fails
(.311venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ python t.py
(.311venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ python -c 'import polars; print(polars.show_versions())'
---Version info---
Polars: 0.15.15
Index type: UInt32
Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python: 3.11.1 (main, Dec 7 2022, 01:11:34) [GCC 11.3.0]
---Optional dependencies---
pyarrow: 10.0.1
pandas: <not installed>
numpy: 1.24.1
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>
None
I'm not getting any assertion error running it in a Kaggle notebook either: https://www.kaggle.com/code/marcogorelli/polars-issue-4700/notebook
Have I misunderstood something about how to run the snippet?
I think the polars part works correctly.
The "weird" part is happening in the python astimezone part as it will convert your date to the timezone you set, but will use your local timezone. (@MarcoGorelli I assume your default timezone is UTC+0, so that is why it works for you.)
datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))
In [55]: d = datetime.datetime(2001, 1, 1)
In [56]: ?d.astimezone
Docstring: tz -> convert to local time in new timezone tz
In [49]: datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))
Out[49]: datetime.datetime(2000, 12, 31, 23, 0, tzinfo=<UTC>)
In [50]: datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("Europe/Brussels"))
Out[50]: datetime.datetime(2001, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>)
In [51]: pl.Series("dt", [datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))])
Out[51]:
shape: (1,)
Series: 'dt' [datetime[μs, UTC]]
[
2000-12-31 23:00:00 UTC
]
In [52]: pl.Series("dt", [datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("Europe/Brussels"))])
Out[52]:
shape: (1,)
Series: 'dt' [datetime[μs, Europe/Brussels]]
[
2001-01-01 00:00:00 CET
]
I assume your default timezone is UTC+0, so that is why it works for you
True (I'm in the UK), but still works for me even if I set a different timezone (which I'm most definitely not in):
In [64]: import polars as pl
...: import pytz
...:
...: tz = 'US/Pacific'
...: s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz=tz)
...: s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone(tz))])
...:
...: assert s1.series_equal(s2)
In [65]: s1
Out[65]:
shape: (1,)
Series: 'dt' [datetime[μs, US/Pacific]]
[
2000-12-31 16:00:00 PST
]
In [66]: s2
Out[66]:
shape: (1,)
Series: 'dt' [datetime[μs, US/Pacific]]
[
2000-12-31 16:00:00 PST
]
Also, your code shows that pl.Series("dt", [datetime.datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))]) does indeed include timezone info (Series: 'dt' [datetime[μs, UTC]]), whereas the original report has a comment on that line saying # Does not include time zone info
I'm generally interested in time-series, but if I can't reproduce the issue then I don't know where to start - so any help with reproducing this would be appreciated
It might be that I fixed this already 🙈 Hmm.. @stinodego could you try on master?
OK, I think it actually changed from when I initially reported it, but this is what happens now (from the master branch):
import polars as pl
from datetime import datetime
import pytz
# Correct output
s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz="UTC") # Includes time zone info
print(s1)
# shape: (1,)
# Series: 'dt' [datetime[μs, UTC]]
# [
# 2001-01-01 00:00:00 UTC
# ]
# Incorrect output
s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))]) # Time shifts by an hour??
print(s2)
# shape: (1,)
# Series: 'dt' [datetime[μs, UTC]]
# [
# 2000-12-31 23:00:00 UTC
# ]
assert s1.series_equal(s2) # Fails
As you can see, the resulting Series now actually contains timezone information, but the underlying datetime is incorrect.
Ah got it, here's how to reproduce if you live in a UTC place:
import polars as pl
from datetime import datetime
import pytz
import os
import time
os.environ['TZ'] = 'Europe/Brussels'
time.tzset()
s1 = pl.Series("dt", [datetime(2001, 1, 1)]).dt.with_time_zone(tz="UTC")
s2 = pl.Series("dt", [datetime(2001, 1, 1).astimezone(pytz.timezone("UTC"))])
print(s1)
print(s2)
assert s1.series_equal(s2) # Fails
Right, so as far as I can tell:
- in polars,
.dt.with_time_zone(tz)on a naive time series will convert from UTC totz - in Python datetime, it converts from your local timezone to
tz(indeed, as @ghuls had said, I'd just misunderstood, sorry)
So, all looks correct, I'd suggest just adding an example and test and closing - I'll make a quick PR
Wow, you're right. The conversion already happens in Python datetime.
Feels very unintuitive to me, but timezones do that sometimes. At least Polars seems to handle things correctly.
A PR is welcome; then we can close this.