SNOW-701383: Upload Arrow dataset to Snowflake
What is the current behavior?
I do not think there is a way to upload an arrow dataset to Snowflake via the snowflake-connector.
What is the desired behavior?
Given an arrow dataframe, I would like to upload it to Snowflake without going via Pandas (which would result in the valuecolumn being cast to float, since numpy integer types don't support null values).
Here is an example of a frame I would like to upload to Snowflake, by creating or appending to a table.
from datetime import date
import pyarrow as pa
id = pa.array(["A", "B", "E"])
date_day = pa.array([date(2022, 1, 1), date(2022, 1, 2), date(2022, 1, 3)])
value = pa.array([2, 4, None])
columns = ["id", "date_day", "value"]
df = pa.Table.from_arrays([id, date_day, value], names=columns)
df
----
pyarrow.Table
id: string
date_day: date32[day]
value: int64
----
id: [["A","B","E"]]
date_day: [[2022-01-01,2022-01-02,2022-01-03]]
value: [[2,4,null]]
Losing datatype information
print(df.to_pandas())
--- # Note that the `value` column now is floating point
id date_day value
0 A 2022-01-01 2.0
1 B 2022-01-02 4.0
2 E 2022-01-03 NaN
How would this improve snowflake-connector-python?
Arrow is the next-generation data format for dataframe types. Libraries like polars already take great advantage of it, and it would be good to support this in order to allow for more correct data interactions with Snowflake.
Currently, Snowflake already allows for fetching data in the arrow format, but as far as I can see, not write data.
hi, I would love to know if this is being done? really looking forward to it.
Hey, any updates on this?
hi and thank you for submitting this improvement request - team will consider it.
+1 it will be great to have polars dataframe support similar to pandas. something like polars_tools.py. with no dependency to pandas.
https://pola.rs/
+1