snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

SNOW-591884: pyarrow dependencies

Open DannyVarod opened this issue 2 years ago • 4 comments

While Snowflake's library is a library we use, it isn't the only library we use and require. Each library has a different version dependency of pyarrow, which can lead to conflicts.

Please do NOT make this library tied to a specific version of pyarrow, especially since not every version is a breaking change, instead it should be tested with and enable using various versions of pyarrow by checking which version of pyarrow is installed and adapt accordingly.

DannyVarod avatar May 18 '22 08:05 DannyVarod

This would be extremely helpful

jordantshaw avatar May 20 '22 15:05 jordantshaw

Please do NOT make this library tied to a specific version of pyarrow

I'm sorry, but we cannot loosen our pinning of pyarrow. However; I think that we have a good reason for this. Our C-extension is compiled against specific pyarrow version and Arrow's internal APIs do change between major releases.

sfc-gh-mkeller avatar May 20 '22 23:05 sfc-gh-mkeller

Upvote, even though I realize that the suggestion is a bit of a pain. Or perhaps arrow could be optional install? I suspect many will uninstall pyarrow after chasing the cause of their segfault, since they may just use pandas.

microprediction avatar Jun 06 '22 15:06 microprediction

Upvote as well

Satchitananda avatar Jul 14 '22 07:07 Satchitananda

I'm sorry, but we cannot loosen our pinning of pyarrow.

@sfc-gh-mkeller would you please elaborate? Perhaps I'm missing something, but AFAICT, this package makes very limited use of pyarrow, to implement a few methods (like SnowflakeCursor.fetch_arrow_batches) that are entirely optional to use—not core functionality. Moreover, in looking at the issues in this repo, it seems that pyarrow upgrades are consistently treated as relatively low priority, with https://github.com/snowflakedb/snowflake-connector-python/pull/1349 being the latest example.

The end result is that many users of this package are blocked from making meaningful upgrades, e.g. to pyarrow 10 or Python 3.11, because a few ancillary features in this package require an outdated version of a great library.

Is there a reason, besides limited developer time, that pyarrow upgrades can't be prioritized, or better yet, the package made optional? Is there data showing heavy use of the pyarrow-based features, and/or performance tests that show significant improvements from using it? (Absent any data, I'd guess the latter is not true, since the Snowflake API always returns row batches of JSON strings. If pyarrow was being used to implement the Flight SQL protocol, the story might be different.)

Please consider changing the policy here. It would be a big win for many users, with zero cost for users who want to continue using a Snowflake-compatible version of pyarrow.

patrickmckenna avatar Dec 13 '22 22:12 patrickmckenna

I am currently trying to upgrade a project to python3.11 but am blocked by this issue. PYarrow is on version 10.x, but snowflake-connector specifies v8.x, which will not build under python3.11, so our web app cannot be upgraded.

shacker avatar Dec 20 '22 01:12 shacker

Similarly here, I'm trying to train some recommender systems with torchrec that uses pyarrow=10.0.1 but I wasn't able to find a snowflake-connector-python version to match those requirements 😢

SetaSouto avatar Dec 29 '22 14:12 SetaSouto

Same here, I have newer pyarrow installed and snowflake-connector-python always failed on installation.

ForsakenRei avatar Jan 27 '23 18:01 ForsakenRei

Would love to see this made optional or upgraded as well.

akravetz avatar Mar 09 '23 15:03 akravetz

@akravetz I just upgraded to snowflake-connector 3.x, then tried updating to python3.11 again, and this time it worked. Problem solved!

shacker avatar Mar 10 '23 06:03 shacker

We are looking into improving our arrow dependency story over the next couple of quarters and will have an update here by the end of May 2023

sfc-gh-achandrasekaran avatar Mar 23 '23 21:03 sfc-gh-achandrasekaran

you're behind a major version release at this point i think. 11 vs 10

dss010101 avatar May 02 '23 16:05 dss010101

👍 pandas and pyarrow restrictions will makesnowflake-connector-python connector unusable with latest versions of dask.

j-bennet avatar May 04 '23 19:05 j-bennet

Hi All ,

We have released a new preview version of connector with reduced sized with nanoarrow and removing the restriction of pyarrow dependency which you can check at this blog post https://medium.com/snowflake/supercharging-the-snowflake-python-connector-with-nanoarrow-8388cb57eeba

Do let us know your feedback. Do note this is still in preview, so we dont recommend it used for production.

Thanks Anurag

sfc-gh-anugupta avatar Jul 26 '23 19:07 sfc-gh-anugupta

Hi all, we're thrilled to announce that snowflake-connector-python 3.5.0 is released which removes the restriction of pyarrow dependency as well as reduces the package size: https://pypi.org/project/snowflake-connector-python/3.5.0/

please give it a try!

sfc-gh-aling avatar Nov 14 '23 17:11 sfc-gh-aling