SNOW-664933: session.create_dataframe() crashes when the Pandas dataframe is empty.
Please answer these questions before submitting your issue. Thanks!
- What version of Python are you using?
Python 3.8
- What operating system and processor architecture are you using?
Snowflake Python UDF
- What are the component versions in the environment (
pip freeze)?
Only snowflake-snowpark-python
- What did you do?
Using session.create_dataframe() does not work with an empty pandas.DataFrame object.
Here's the full code:
orders_table = session.table('orders')
pandas_df = orders_table.to_pandas()
filtered_pandas_df = pandas_df.loc[pandas_df['CUSTOMER_ID'] == '<DOESNT_EXIST>']
# filtered_df is an empty Pandas dataframe.
snowpark_df = session.create_dataframe(filtered_pandas_df)
# Here I got an exception.
Here it is:
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/snowpark/session.py:1066, in Session.write_pandas(self, df, table_nam
e, database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite)
1060 else:
1061 location = (
1062 (database + "." if database else "")
1063 + (schema + "." if schema else "")
1064 + (table_name)
1065 )
-> 1066 success, nchunks, nrows, ci_output = write_pandas(
1067 self._conn._conn,
1068 df,
1069 table_name,
1070 database=database,
1071 schema=schema,
1072 chunk_size=chunk_size,
1073 compression=compression,
1074 on_error=on_error,
1075 parallel=parallel,
1076 quote_identifiers=quote_identifiers,
1077 auto_create_table=auto_create_table,
1078 create_temp_table=create_temp_table,
1079 overwrite=overwrite,
1080 )
1081 except ProgrammingError as pe:
1082 if pe.msg.endswith("does not exist"):
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:183, in write_pandas(conn, df, table_name,
database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite, tab
le_type)
180 raise
182 with TemporaryDirectory() as tmp_folder:
--> 183 for i, chunk in chunk_helper(df, chunk_size):
184 chunk_path = os.path.join(tmp_folder, f"file{i}.txt")
185 # Dump chunk into parquet file
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:37, in chunk_helper(lst, n)
35 def chunk_helper(lst: T, n: int) -> Iterator[tuple[int, T]]:
36 """Helper generator to chunk a sequence efficiently with current index like if enumerate was called on sequence."""
---> 37 for i in range(0, len(lst), n):
38 yield int(i / n), lst[i : i + n]
ValueError: range() arg 3 must not be zero
- What did you expect to see?
I expected to get an empty snowflake.snowpark.DataFrame with the defined columns so that I'd be able to write it to a table with no rows.
@elongl Thanks for your feedback! Ack it's a bug and we'll look into it. Meanwhile, if you want to have an empty Snowpark DataFrame with the defined columns, you can do:
schema = StructType(
[StructField("a", IntegerType()), StructField("b", IntegerType())]
)
df = session.create_dataframe([], schema=schema)
cc @sfc-gh-stan
@sfc-gh-jdu @sfc-gh-stan
Thank you very much for addressing the issue. Did you get a chance to look at the other one I created? It's a bit more important I believe.
Here it is.