Conversion Error: Overflow exception in date/time -> timestamp conversion
What happens?
When I try to query with params that are datetime with tzinfo I get Conversion Error: Overflow exception in date/time -> timestamp conversion.
Maybe this is a bug. Maybe its just not very clear from the documentation how to use datetime params with time zones.
I've tried searching the documentation and doing google search without luck.
To Reproduce
Run
from datetime import datetime
import duckdb
import pandas as pd
from pytz import timezone
ts_df = pd.DataFrame({ "ts": [pd.Timestamp('2024-01-01 00:00:00+0100', tz='Europe/Copenhagen'), pd.Timestamp('2024-01-02 00:00:00+0100', tz='Europe/Copenhagen')] })
query = "select * from ts_df where ts = $notationtime"
params = {"notationtime": datetime(2024, 1, 1),}
params_zoneinfo = {"notationtime": datetime(2024, 1, 1, tzinfo=timezone("Europe/Copenhagen")),}
with duckdb.connect() as conn:
conn.sql("set TimeZone = 'Europe/Copenhagen'")
print(conn.sql(query, params=params).df())
print(conn.sql(query, params=params_zoneinfo).df())
and see
$ python script.py
ts
0 2024-01-01 00:00:00+01:00
Traceback (most recent call last):
File "/home/jovyan/repos/mt-pm-reporting/script.py", line 13, in <module>
with duckdb.connect() as conn:
File "/home/jovyan/repos/mt-pm-reporting/script.py", line 17, in <module>
print(conn.sql(query, params=params_zoneinfo).df())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.ConversionException: Conversion Error: Overflow exception in date/time -> timestamp conversion
OS:
linux
DuckDB Version:
1.0.0
DuckDB Client:
Python
Full Name:
Marc Skov Madsen
Affiliation:
Orsted
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- [X] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [X] Yes, I have
Hi @MarcSkovMadsen, thanks for opening this issue.
I'm unable to reproduce it: when I run the script, I get the following output and a zero exit code:
ts
0 2024-01-01 00:00:00+01:00
Empty DataFrame
Columns: [ts]
Index: []
Thanks. Let me try once again tomorrow in a totally fresh environment and then share all dependencies.
Even though it does not error for you, I'm still confused about why the params without timezone gives the result Im after.
I've recreated my environment and installed.
uv venv
uv pip install duckdb pandas pytz
Its still errors. My environment is
I'm running on linux with Python 3.11.6 and
$ uv pip freeze
duckdb==1.0.0
numpy==2.0.0
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1
I tried downgrading numpy (uv pip install 'numpy<2.0') to 1.26.4 but it still errors.
Hi @MarcSkovMadsen, I replicated your environment but it still doesn't reproduce. I tried both Python 3.11 and 3.12.
I'll let someone else from the team also take a look.
$ uv pip freeze
duckdb==1.0.0
numpy==2.0.0
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1
$ .venv/bin/python script.py
ts
0 2024-01-01 00:00:00+01:00
Empty DataFrame
Columns: [ts]
Index: []
$ .venv/bin/python --version
Python 3.12.4
I'm running on linux inside a docker container based on a Debian image if that can help.
Besides the exception, I would also like to understand which of the two filtering params I should be using and why.
params = {"notationtime": datetime(2024, 1, 1),}
params_zoneinfo = {"notationtime": datetime(2024, 1, 1, tzinfo=timezone("Europe/Copenhagen")),}
@MarcSkovMadsen the problem here probably originates from this lovely piece of code: https://fossies.org/dox/openslides-2.3-portable/tzinfo_8py_source.html#l00396
Specifically:
if dt is None:
return None
We are dealing with a tzinfo object which should have its utcoffset baked in or can at least internally compute it, it makes no real sense to me to require a dt object to be passed in, so previously we just gave it None.
But I'm probably just missing some critical piece of knowledge
I think if None is provided, it's safe to assume we are interested in the internal offset, but I guess we'll push the datetime down so we can avoid this problem and any ambiguity