snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

Added timezone type to dfs when the corresponding pd Df also has timezones

Open frederiksteiner opened this issue 1 year ago • 3 comments

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #1952

  2. Fill out the following pre-review checklist:

    • [ ] I am adding a new automated test(s) to verify correctness of my new code
    • [ ] I am adding new logging messages
    • [ ] I am adding a new telemetry message
    • [ ] I am modifying authorization mechanisms
    • [ ] I am adding new credentials
    • [ ] I am modifying OCSP code
    • [ ] I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Checks if there are any tz_columns and changes the column mapping such that the resulting type is a timezone type

frederiksteiner avatar May 24 '24 08:05 frederiksteiner

Didn‘t manage to install tox on my device, so some checks might be failing :/

frederiksteiner avatar Jun 28 '24 20:06 frederiksteiner

Isn't that to be expected? Aren't types per column and hence it cannot be done correclty anyways. But I think it gets handled correctly when saving to parquet:

import pytz
import pandas as pd
import pyarrow.parquet as pq
from datetime import datetime
df = pd.DataFrame({"DT": [
    datetime.now(tz=pytz.timezone("Europe/Amsterdam")),
    datetime.now(tz=pytz.timezone("UTC")),
    ]})

df.to_parquet("test.parquet")
pq.read_schema("test.parquet")

This gives the following output: DT: timestamp[us, tz=Europe/Amsterdam] -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 366 And the resulting dataframe looks as follows if re reading it: DT 0 2024-07-01 08:54:18.603081+02:00 1 2024-07-01 08:54:18.603099+02:00 with dtypes: DT datetime64[us, Europe/Amsterdam].

frederiksteiner avatar Jul 01 '24 07:07 frederiksteiner

@pytest.mark.parametrize("use_logical_type", [None, True, False])
def test_write_pandas_use_logical_type(
    conn_cnx: Callable[..., Generator[SnowflakeConnection, None, None]],
    use_logical_type: bool | None,
):
    table_name = random_string(5, "USE_LOCAL_TYPE_").upper()
    col_name = "DT"
    create_sql = f"CREATE OR REPLACE TABLE {table_name} ({col_name} TIMESTAMP_TZ)"
    select_sql = f"SELECT * FROM {table_name}"
    drop_sql = f"DROP TABLE IF EXISTS {table_name}"
    timestamp = datetime(
        year=2020,
        month=1,
        day=2,
        hour=3,
        minute=4,
        second=5,
        microsecond=6,
        tzinfo=timezone(timedelta(hours=2)),
    )
###################### changed/new lines start
    timestamp_2 = datetime(
        year=2020,
        month=1,
        day=2,
        hour=3,
        minute=4,
        second=5,
        microsecond=6,
        tzinfo=timezone(timedelta(hours=4)),
    )
    df_write = pandas.DataFrame({col_name: [timestamp, timestamp_2]})
#####changed lines end

    with conn_cnx() as cnx:  # type: SnowflakeConnection
        cnx.cursor().execute(create_sql).fetchall()

        write_pandas_kwargs = dict(
            conn=cnx,
            df=df_write,
            use_logical_type=use_logical_type,
            auto_create_table=False,
            table_name=table_name,
        )

        try:
            # When use_logical_type = True, datetimes with timestamps should be
            # correctly written to Snowflake.
            if use_logical_type:
                write_pandas(**write_pandas_kwargs)
                df_read = cnx.cursor().execute(select_sql).fetch_pandas_all()
                assert all(df_write == df_read)
                assert pandas.api.types.is_datetime64tz_dtype(df_read[col_name])
            # For other use_logical_type values, a UserWarning should be displayed.
            else:
                with pytest.warns(UserWarning, match="Dataframe contains a datetime.*"):
                    write_pandas(**write_pandas_kwargs)
        finally:
            cnx.execute_string(drop_sql)

When adding a similar example as yours to the unittest, then everything works when setting use_logical_type=True. But not when it is None or False, since then the check here fails.

Hence this check probably needs adaptation.

frederiksteiner avatar Jul 01 '24 07:07 frederiksteiner