snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

SNOW-701383: Upload Arrow dataset to Snowflake

Open thomasaarholt opened this issue 3 years ago • 5 comments

What is the current behavior?

I do not think there is a way to upload an arrow dataset to Snowflake via the snowflake-connector.

What is the desired behavior?

Given an arrow dataframe, I would like to upload it to Snowflake without going via Pandas (which would result in the valuecolumn being cast to float, since numpy integer types don't support null values).

Here is an example of a frame I would like to upload to Snowflake, by creating or appending to a table.

from datetime import date
import pyarrow as pa
id = pa.array(["A", "B", "E"])
date_day = pa.array([date(2022, 1, 1), date(2022, 1, 2), date(2022, 1, 3)])
value = pa.array([2, 4, None])
columns = ["id", "date_day", "value"]

df = pa.Table.from_arrays([id, date_day, value], names=columns)
df
----
pyarrow.Table
id: string
date_day: date32[day]
value: int64
----
id: [["A","B","E"]]
date_day: [[2022-01-01,2022-01-02,2022-01-03]]
value: [[2,4,null]]

Losing datatype information

print(df.to_pandas())
--- # Note that the `value` column now is floating point
  id    date_day  value
0  A  2022-01-01    2.0
1  B  2022-01-02    4.0
2  E  2022-01-03    NaN

How would this improve snowflake-connector-python?

Arrow is the next-generation data format for dataframe types. Libraries like polars already take great advantage of it, and it would be good to support this in order to allow for more correct data interactions with Snowflake.

Currently, Snowflake already allows for fetching data in the arrow format, but as far as I can see, not write data.

thomasaarholt avatar Nov 29 '22 19:11 thomasaarholt

hi, I would love to know if this is being done? really looking forward to it.

prabodh1194 avatar Sep 24 '23 17:09 prabodh1194

Hey, any updates on this?

mohith7548 avatar Feb 09 '24 11:02 mohith7548

hi and thank you for submitting this improvement request - team will consider it.

sfc-gh-dszmolka avatar Mar 08 '24 16:03 sfc-gh-dszmolka

+1 it will be great to have polars dataframe support similar to pandas. something like polars_tools.py. with no dependency to pandas.

https://pola.rs/

ismailsimsek avatar Mar 15 '24 08:03 ismailsimsek

+1

sidhreddy avatar Sep 04 '24 02:09 sidhreddy