snowflake-connector-python
snowflake-connector-python copied to clipboard
SNOW-620952: After upgrading to version 2.7.9, empty Snowflake table is being created when using pandas to_sql(method=pd_writer)
Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using?
3.9.13
-
What operating system and processor architecture are you using?
Windows 10
-
What are the component versions in the environment (
pip freeze)?snowflake-connector-python==2.7.9 snowflake-sqlalchemy==1.3.4
-
What did you do?
df = pd.read_csv("some_path_to_csv_file")
engine = create_engine(
URL(
user=some_user_name,
password=some_password,
account=some_account,
authenticator=some_authenticator,
database=some_database,
schema=some_schema,
warehouse=some_warehouse,
role=some_role,
)
)
table_name = some_table_name
with engine.connect() as conn:
df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')
- What did you expect to see?
Expected to see Snowflake table created with rows of data in it. Instead, I got a Snowflake table with no rows in it. No warning or error message whatsoever. Table exists with columns, but no rows of data in it. Same code ran fine with earlier version of snowflake-connector-python==2.7.1 and snowflake-sqlalchemy==1.3.2. So I downgraded to these older versions. After downgrading to older versions, I now have Snowflake table with rows of data in it.
Havin the exact same issue. Python 3.8.8, Windows 11, snowflake-connector-python==2.7.9, snowflake-sqlalchemy==1.4.0
The same here, Python 3.8.8, Windows 11, snowflake-connector-python==2.7.11
Same issue here. Python 3.9.10, snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.41
same here. Python 3.9.14, snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.1
downgrading to snowflake-connector-python==2.7.6 fixed the problem. It seems to me it was introduced in version 2.7.7
I experienced a similar issue. I tried downgrading as @captnswing suggested. That didn't work in my case.
I added a little logging snippet:
import logging
logging.basicConfig(
level=logging.INFO,
# format="%(asctime)s [%(levelname)s] %(filename)s %(module)s %(funcName)s %(lineno)d %(message)s",
format="%(message)s",
handlers=[
logging.FileHandler("debug.log"),
logging.StreamHandler()
]
)
... and notice that the data is copied into the table but, for some reason, the transaction is rolled back right at the end:
Snowflake Connector for Python Version: 2.7.12, Python Version: 3.9.6, Platform: macOS-12.5.1-arm64-arm-64bit
This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
Setting use_openssl_only mode to False
query: [select current_database(), current_schema();]
query execution done
query: [ROLLBACK]
query execution done
query: [DESC TABLE /* sqlalchemy:_has_object */ mytable]
query execution done
query: [DESC TABLE /* sqlalchemy:_has_object */ mytable]
query execution done
query: [SHOW /* sqlalchemy:get_table_names */ TABLES IN myschema]
query execution done
query: [select current_database(), current_schema();]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_primary_keys */PRIMARY KEYS IN SCHEMA dev.myschem...]
query execution done
query: [SELECT /* sqlalchemy:_get_schema_columns */ ic.table_name, ic.column_name, ic.da...]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_foreign_keys */ IMPORTED KEYS IN SCHEMA dev.dbt_d...]
query execution done
query: [SHOW /* sqlalchemy:_get_schema_unique_constraints */ UNIQUE KEYS IN SCHEMA dev.d...]
query execution done
query: [SHOW /* sqlalchemy:_get_table_comment */ TABLES LIKE 'mytable']
query execution done
query: [DROP TABLE mytable]
query execution done
query: [CREATE TABLE mytable ( "SomeID" TEXT, "SomeName" TEXT, "SomeDescriptionThatsLon"...]
query execution done
query: [create temporary stage /* Python:snowflake.connector.pandas_tools.write_pandas()...]
query execution done
query: [PUT /* Python:snowflake.connector.pandas_tools.write_pandas() */ 'file:///var/fo...]
query execution done
query: [COPY INTO "MYTABLE" /* Python:snowflake.connector.pandas_tools.write_pandas() */...]
query execution done
query: [create temporary stage /* Python:snowflake.connector.pandas_tools.write_pandas()...]
query execution done
query: [PUT /* Python:snowflake.connector.pandas_tools.write_pandas() */ 'file:///var/fo...]
query execution done
query: [COPY INTO "MYTABLE" /* Python:snowflake.connector.pandas_tools.write_pandas() */...]
query execution done
query: [ROLLBACK]
query execution done
There's no error message.
Same issue here snowflake-connector-python==2.7.12, snowflake-sqlalchemy==1.4.1, SQLAlchemy==1.4.41 Tried downgrading and didn't work for me.
I used write_pandas as a workaround:
from snowflake.connector.pandas_tools import write_pandas
import snowflake.connector
import pandas as pd
import os
con = snowflake.connector.connect(
account=os.getenv("SNOWFLAKE_ACCOUNT"),
user=os.getenv("SNOWFLAKE_USER"),
password=os.getenv("SNOWFLAKE_PASSWORD"),
database=os.getenv("SNOWFLAKE_DATABASE"),
warehouse=os.getenv("SNOWFLAKE_WAREHOUSE"),
role=os.getenv("SNOWFLAKE_ROLE"),
schema=os.getenv("SNOWFLAKE_SCHEMA")
)
table_name = "MYTABLE"
df = pd.read_csv("/path/to/data", delimiter="|")
df.reset_index(drop=True, inplace=True)
success, nchunks, nrows, _ = write_pandas(con,
df,
table_name,
chunk_size=10000,
auto_create_table=True,
schema=os.getenv("SNOWFLAKE_SCHEMA"))
It's not quite as elegant as to_sql but got the job done.
upgrading to snowflake-connector-python==2.8.0 fixed the problem for me
I think I found the issue. I've been struggling with the same.
A bit of background. In my company we are using SSO so I have to use the authenticator = 'externalbrowser' option in URL. Because of this I set engine = create_engine(URL(...)).connect() instead of engine = create_engine(URL(...)) so that when I use multiple pd.read_sql it's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the .connect() and now I see the rows being inserted.
In your code, try to replace
with engine.connect() as conn:
df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')
with simply
df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')
To developers: I'm not sure if this is a pd_writer issue or pandas issue. Would you be able to alter the relevant module to use either engine or engine.connect() in the con argument?
I think I found the issue. I've been struggling with the same. A bit of background. In my company we are using SSO so I have to use the
authenticator = 'externalbrowser'option inURL. Because of this I setengine = create_engine(URL(...)).connect()instead ofengine = create_engine(URL(...))so that when I use multiplepd.read_sqlit's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the.connect()and now I see the rows being inserted.In your code, try to replace
with engine.connect() as conn: df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')with simply
df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')To developers: I'm not sure if this is a
pd_writerissue orpandasissue. Would you be able to alter the relevant module to use eitherengineorengine.connect()in theconargument?
This worked for me as well, interesting. Thank you!
I think I found the issue. I've been struggling with the same. A bit of background. In my company we are using SSO so I have to use the
authenticator = 'externalbrowser'option inURL. Because of this I setengine = create_engine(URL(...)).connect()instead ofengine = create_engine(URL(...))so that when I use multiplepd.read_sqlit's not triggering the login sequence. It works fine with reading data but when trying to insert data, it just creates a table in Snowflake and does not insert the data in. I fixed it by removing the.connect()and now I see the rows being inserted.In your code, try to replace
with engine.connect() as conn: df.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')with simply
df.to_sql(name=table_name, con=engine, index=False, method=pd_writer, if_exists='replace')To developers: I'm not sure if this is a
pd_writerissue orpandasissue. Would you be able to alter the relevant module to use eitherengineorengine.connect()in theconargument?
This worked for my use cases as well. Thank you!
Looking at pandas code gives me some more hint towards the problem. See: https://github.com/pandas-dev/pandas/blob/v1.5.1/pandas/io/sql.py#L1396-L1399
Based on this, either @budaesandrei's solution works, or if you would like to use engine.connect() as conn use this:
with engine.connect() as conn:
with conn.begin():
pdf.to_sql(name=table_name, con=conn, index=False, method=pd_writer, if_exists='replace')