snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

SNOW-620952: After upgrading to version 2.7.9, empty Snowflake table is being created when using pandas to_sql(method=pd_writer)

Open pybokeh opened this issue 3 years ago • 10 comments

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.9.13

  2. What operating system and processor architecture are you using?

    Windows 10

  3. What are the component versions in the environment (pip freeze)?

    snowflake-connector-python==2.7.9 snowflake-sqlalchemy==1.3.4

  4. What did you do?

df = pd.read_csv("some_path_to_csv_file")

engine = create_engine(
    URL(
        user=some_user_name,
        password=some_password,
        account=some_account,
        authenticator=some_authenticator,
        database=some_database,
        schema=some_schema,
        warehouse=some_warehouse,
        role=some_role,
    )
)

table_name = some_table_name

with engine.connect() as conn:
    df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')
  1. What did you expect to see?

Expected to see Snowflake table created with rows of data in it. Instead, I got a Snowflake table with no rows in it. No warning or error message whatsoever. Table exists with columns, but no rows of data in it. Same code ran fine with earlier version of snowflake-connector-python==2.7.1 and snowflake-sqlalchemy==1.3.2. So I downgraded to these older versions. After downgrading to older versions, I now have Snowflake table with rows of data in it.

pybokeh avatar Jul 01 '22 03:07 pybokeh

Havin the exact same issue. Python 3.8.8, Windows 11, snowflake-connector-python==2.7.9, snowflake-sqlalchemy==1.4.0

mjng93 avatar Aug 16 '22 21:08 mjng93

The same here, Python 3.8.8, Windows 11, snowflake-connector-python==2.7.11

listewka avatar Aug 26 '22 11:08 listewka

Same issue here. Python 3.9.10, snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.41

flaviolisdero avatar Sep 19 '22 16:09 flaviolisdero

same here. Python 3.9.14, snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.1

captnswing avatar Sep 23 '22 15:09 captnswing

downgrading to snowflake-connector-python==2.7.6 fixed the problem. It seems to me it was introduced in version 2.7.7

captnswing avatar Sep 23 '22 15:09 captnswing

I experienced a similar issue. I tried downgrading as @captnswing suggested. That didn't work in my case.

I added a little logging snippet:

import logging

logging.basicConfig(
    level=logging.INFO,
    # format="%(asctime)s [%(levelname)s] %(filename)s %(module)s %(funcName)s %(lineno)d %(message)s",
    format="%(message)s",
    handlers=[
        logging.FileHandler("debug.log"),
        logging.StreamHandler()
    ]
)

... and notice that the data is copied into the table but, for some reason, the transaction is rolled back right at the end:

Snowflake Connector for Python Version: 2.7.12, Python Version: 3.9.6, Platform: macOS-12.5.1-arm64-arm-64bit
This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
Setting use_openssl_only mode to False
query: [select current_database(), current_schema();]
query execution done
query: [ROLLBACK]
query execution done
query: [DESC TABLE /* sqlalchemy:_has_object */ mytable]
query execution done
query: [DESC TABLE /* sqlalchemy:_has_object */ mytable]
query execution done
query: [SHOW /* sqlalchemy:get_table_names */ TABLES IN myschema]
query execution done
query: [select current_database(), current_schema();]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_primary_keys */PRIMARY KEYS IN SCHEMA dev.myschem...]
query execution done
query: [SELECT /* sqlalchemy:_get_schema_columns */ ic.table_name, ic.column_name, ic.da...]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_foreign_keys */ IMPORTED KEYS IN SCHEMA dev.dbt_d...]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_unique_constraints */ UNIQUE KEYS IN SCHEMA dev.d...]
query execution done
query: [SHOW /* sqlalchemy:_get_table_comment */ TABLES LIKE 'mytable']
query execution done
query: [DROP TABLE mytable]
query execution done
query: [CREATE TABLE mytable ( "SomeID" TEXT, "SomeName" TEXT, "SomeDescriptionThatsLon"...]
query execution done
query: [create temporary stage /* Python:snowflake.connector.pandas_tools.write_pandas()...]
query execution done
query: [PUT /* Python:snowflake.connector.pandas_tools.write_pandas() */ 'file:///var/fo...]
query execution done
query: [COPY INTO "MYTABLE" /* Python:snowflake.connector.pandas_tools.write_pandas() */...]
query execution done
query: [create temporary stage /* Python:snowflake.connector.pandas_tools.write_pandas()...]
query execution done
query: [PUT /* Python:snowflake.connector.pandas_tools.write_pandas() */ 'file:///var/fo...]
query execution done
query: [COPY INTO "MYTABLE" /* Python:snowflake.connector.pandas_tools.write_pandas() */...]
query execution done
query: [ROLLBACK]
query execution done

There's no error message.

alexwoolford avatar Sep 24 '22 03:09 alexwoolford

Same issue here snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.1, SQLAlchemy==1.4.41 Tried downgrading and didn't work for me.

chrisdurban87 avatar Sep 26 '22 16:09 chrisdurban87

I used write_pandas as a workaround:

from snowflake.connector.pandas_tools import write_pandas
import snowflake.connector
import pandas as pd
import os

con = snowflake.connector.connect(
    account=os.getenv("SNOWFLAKE_ACCOUNT"),
    user=os.getenv("SNOWFLAKE_USER"),
    password=os.getenv("SNOWFLAKE_PASSWORD"),
    database=os.getenv("SNOWFLAKE_DATABASE"),
    warehouse=os.getenv("SNOWFLAKE_WAREHOUSE"),
    role=os.getenv("SNOWFLAKE_ROLE"),
    schema=os.getenv("SNOWFLAKE_SCHEMA")
)

table_name = "MYTABLE"
df = pd.read_csv("/path/to/data", delimiter="|")
df.reset_index(drop=True, inplace=True)
success, nchunks, nrows, _ = write_pandas(con,
                                          df,
                                          table_name,
                                          chunk_size=10000,
                                          auto_create_table=True,
                                          schema=os.getenv("SNOWFLAKE_SCHEMA"))

It's not quite as elegant as to_sql but got the job done.

alexwoolford avatar Sep 26 '22 17:09 alexwoolford

upgrading to snowflake-connector-python==2.8.0 fixed the problem for me

captnswing avatar Oct 10 '22 20:10 captnswing

I think I found the issue. I've been struggling with the same. A bit of background. In my company we are using SSO so I have to use the authenticator = 'externalbrowser' option in URL. Because of this I set engine = create_engine(URL(...)).connect() instead of engine = create_engine(URL(...)) so that when I use multiple pd.read_sql it's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the .connect() and now I see the rows being inserted.

In your code, try to replace

with engine.connect() as conn:
    df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')

with simply

df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')

To developers: I'm not sure if this is a pd_writer issue or pandas issue. Would you be able to alter the relevant module to use either engine or engine.connect() in the con argument?

budaesandrei avatar Oct 13 '22 10:10 budaesandrei

I think I found the issue. I've been struggling with the same. A bit of background. In my company we are using SSO so I have to use the authenticator = 'externalbrowser' option in URL. Because of this I set engine = create_engine(URL(...)).connect() instead of engine = create_engine(URL(...)) so that when I use multiple pd.read_sql it's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the .connect() and now I see the rows being inserted.

In your code, try to replace

with engine.connect() as conn:
    df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')

with simply

df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')

To developers: I'm not sure if this is a pd_writer issue or pandas issue. Would you be able to alter the relevant module to use either engine or engine.connect() in the con argument?

This worked for me as well, interesting. Thank you!

ghost avatar Oct 26 '22 03:10 ghost

I think I found the issue. I've been struggling with the same. A bit of background. In my company we are using SSO so I have to use the authenticator = 'externalbrowser' option in URL. Because of this I set engine = create_engine(URL(...)).connect() instead of engine = create_engine(URL(...)) so that when I use multiple pd.read_sql it's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the .connect() and now I see the rows being inserted.

In your code, try to replace

with engine.connect() as conn:
    df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')

with simply

df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')

To developers: I'm not sure if this is a pd_writer issue or pandas issue. Would you be able to alter the relevant module to use either engine or engine.connect() in the con argument?

This worked for my use cases as well. Thank you!

UmutZho avatar Nov 10 '22 10:11 UmutZho

Looking at pandas code gives me some more hint towards the problem. See: https://github.com/pandas-dev/pandas/blob/v1.5.1/pandas/io/sql.py#L1396-L1399

Based on this, either @budaesandrei's solution works, or if you would like to use engine.connect() as conn use this:

with engine.connect() as conn:
    with conn.begin():
        pdf.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')

sfc-gh-aalam avatar Nov 22 '22 19:11 sfc-gh-aalam