duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

Conversion Error: Overflow exception in date/time -> timestamp conversion

Open MarcSkovMadsen opened this issue 1 year ago • 9 comments

What happens?

When I try to query with params that are datetime with tzinfo I get Conversion Error: Overflow exception in date/time -> timestamp conversion.

Maybe this is a bug. Maybe its just not very clear from the documentation how to use datetime params with time zones.

I've tried searching the documentation and doing google search without luck.

To Reproduce

Run

from datetime import datetime

import duckdb
import pandas as pd
from pytz import timezone

ts_df = pd.DataFrame({ "ts": [pd.Timestamp('2024-01-01 00:00:00+0100', tz='Europe/Copenhagen'), pd.Timestamp('2024-01-02 00:00:00+0100', tz='Europe/Copenhagen')] })

query = "select * from ts_df where ts = $notationtime"
params = {"notationtime": datetime(2024, 1, 1),}
params_zoneinfo = {"notationtime": datetime(2024, 1, 1, tzinfo=timezone("Europe/Copenhagen")),}

with duckdb.connect() as conn:
    conn.sql("set TimeZone = 'Europe/Copenhagen'")
    
    print(conn.sql(query, params=params).df())
    print(conn.sql(query, params=params_zoneinfo).df())

and see

$ python script.py
                         ts
0 2024-01-01 00:00:00+01:00
Traceback (most recent call last):
  File "/home/jovyan/repos/mt-pm-reporting/script.py", line 13, in <module>
    with duckdb.connect() as conn:
  File "/home/jovyan/repos/mt-pm-reporting/script.py", line 17, in <module>
    print(conn.sql(query, params=params_zoneinfo).df())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.ConversionException: Conversion Error: Overflow exception in date/time -> timestamp conversion

OS:

linux

DuckDB Version:

1.0.0

DuckDB Client:

Python

Full Name:

Marc Skov Madsen

Affiliation:

Orsted

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • [X] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [X] Yes, I have

MarcSkovMadsen avatar Jun 27 '24 17:06 MarcSkovMadsen

Hi @MarcSkovMadsen, thanks for opening this issue.

I'm unable to reproduce it: when I run the script, I get the following output and a zero exit code:

                         ts
0 2024-01-01 00:00:00+01:00
Empty DataFrame
Columns: [ts]
Index: []

szarnyasg avatar Jun 27 '24 17:06 szarnyasg

Thanks. Let me try once again tomorrow in a totally fresh environment and then share all dependencies.

MarcSkovMadsen avatar Jun 27 '24 20:06 MarcSkovMadsen

Even though it does not error for you, I'm still confused about why the params without timezone gives the result Im after.

MarcSkovMadsen avatar Jun 27 '24 20:06 MarcSkovMadsen

I've recreated my environment and installed.

uv venv
uv pip install duckdb pandas pytz

Its still errors. My environment is

I'm running on linux with Python 3.11.6 and

$ uv pip freeze
duckdb==1.0.0
numpy==2.0.0
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1

MarcSkovMadsen avatar Jun 28 '24 03:06 MarcSkovMadsen

I tried downgrading numpy (uv pip install 'numpy<2.0') to 1.26.4 but it still errors.

MarcSkovMadsen avatar Jun 28 '24 03:06 MarcSkovMadsen

Hi @MarcSkovMadsen, I replicated your environment but it still doesn't reproduce. I tried both Python 3.11 and 3.12.

I'll let someone else from the team also take a look.

$ uv pip freeze
duckdb==1.0.0
numpy==2.0.0
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1

$ .venv/bin/python script.py
                         ts
0 2024-01-01 00:00:00+01:00
Empty DataFrame
Columns: [ts]
Index: []

$ .venv/bin/python --version
Python 3.12.4

szarnyasg avatar Jun 28 '24 06:06 szarnyasg

I'm running on linux inside a docker container based on a Debian image if that can help.

MarcSkovMadsen avatar Jun 28 '24 11:06 MarcSkovMadsen

Besides the exception, I would also like to understand which of the two filtering params I should be using and why.

params = {"notationtime": datetime(2024, 1, 1),}
params_zoneinfo = {"notationtime": datetime(2024, 1, 1, tzinfo=timezone("Europe/Copenhagen")),}

MarcSkovMadsen avatar Jun 28 '24 11:06 MarcSkovMadsen

@MarcSkovMadsen the problem here probably originates from this lovely piece of code: https://fossies.org/dox/openslides-2.3-portable/tzinfo_8py_source.html#l00396

Specifically:

        if dt is None:
            return None

We are dealing with a tzinfo object which should have its utcoffset baked in or can at least internally compute it, it makes no real sense to me to require a dt object to be passed in, so previously we just gave it None. But I'm probably just missing some critical piece of knowledge

I think if None is provided, it's safe to assume we are interested in the internal offset, but I guess we'll push the datetime down so we can avoid this problem and any ambiguity

Tishj avatar Jun 28 '24 11:06 Tishj