pandas-ai icon indicating copy to clipboard operation
pandas-ai copied to clipboard

Pandas 2.0 conflicts with the SQL connectors

Open gventuri opened this issue 10 months ago • 17 comments

System Info

OS version: any Python version: any pandasai: any

🐛 Describe the bug

Our current setup is facing issues when attempting to upgrade to pandas versions beyond 2.0, as this update seems to interfere with the functionality of our SQL connectors. It's essential for us to delve into the root cause of this problem to identify a viable solution. This will enable us to successfully transition to a more recent version of pandas, as well as to the latest version of modin.

gventuri avatar Apr 02 '24 18:04 gventuri

@gventuri In pandas 2.0, the read_sql function now mandates SQLAlchemy version 2.0 or higher. However, upgrading SQLAlchemy may cause certain connectors to break, as they do not yet support SQLAlchemy 2.0 or above.

ArslanSaleem avatar Apr 03 '24 05:04 ArslanSaleem

@ArslanSaleem, do you have any insight on which connectors do not support sqlalchemy 2.0 or above?

YarShev avatar Apr 11 '24 13:04 YarShev

@ArslanSaleem it looks like snowflake-connector-python[pandas] does not require sqlalchemy at all, and the latest version of snowflake-sqlalchemy seems to allow 2.0.29. As @YarShev , please do let us know which snowflake connector is not working for you. Please also share python and operating system versions.

sfc-gh-mvashishtha avatar Apr 12 '24 23:04 sfc-gh-mvashishtha

@sfc-gh-mvashishtha xref- https://github.com/snowflakedb/snowflake-sqlalchemy/issues/380 - it looks like the latest release on Apr 11 (EDIT: That release got yanked - no reason provided) might have addressed it.

pip install snowflake-sqlalchemy==1.5.1 'sqlalchemy>2.0' fails with

ERROR: Cannot install snowflake-sqlalchemy==1.5.1 and sqlalchemy>2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested sqlalchemy>2.0
    snowflake-sqlalchemy 1.5.1 depends on sqlalchemy<2.0.0 and >=1.4.0

not sure if this was the core issue, if not apologies!

asishm avatar Apr 14 '24 13:04 asishm

@asishm @sfc-gh-mvashishtha from what I see, it might be a problem only from pandas >= 2.2. Can we assume we can safely migrate to pandas > 2 < 2.2 for the time being?

@ArslanSaleem would that work?

gventuri avatar Apr 15 '24 18:04 gventuri

@gventuri, pandas added support for SQLAlchemy 2.0 as of pandas 2.0.

YarShev avatar Apr 15 '24 20:04 YarShev

@ArslanSaleem it looks like snowflake-connector-python[pandas] does not require sqlalchemy at all, and the latest version of snowflake-sqlalchemy seems to allow 2.0.29. As @YarShev , please do let us know which snowflake connector is not working for you. Please also share python and operating system versions.

@sfc-gh-mvashishtha, @ArslanSaleem, given that, can we assume there is no an issue with the latest version of snowflake-sqlalchemy?

YarShev avatar Apr 15 '24 20:04 YarShev

@YarShev There is. It's not possible to use pandas 2.2 and snowflake-sqlalchemy in the same project.

  • pandas>=2.2 requires sqlalchemy>=2.0: https://github.com/pandas-dev/pandas/blob/main/pyproject.toml#L115
  • snowflake-sqlalchemy requires sqlalchemy>=1.4.19,<2.0.0: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/main/pyproject.toml#L41

Someone tried to add support for 2.0 on their own almost 1 year ago, and sent this PR to the snowflake-sqlalchemy repo: https://github.com/snowflakedb/snowflake-sqlalchemy/pull/414, but it was completely ignored and it's now closed.

There are many issues open in snowflake-sqlalchemy about this lack of support for >=2.0, with the main one being https://github.com/snowflakedb/snowflake-sqlalchemy/issues/380

In that same issue on Mar 9, someone from the snowflake team said (regarding support for sqlalchemy>=2.0):

I can confirm the implementation is currently in progress and we plan to release it by end of Q1 (April 2024). Please note this is not a committed-to date, just a rough estimation which is subject to change.

Hopefully we'll see support for sqlalchemy 2 by the end of April, but I wouldn't bet on it.

If snowflake-sqlalchemy is the only connector blocking the upgrade, I wouldn't wait much longer, unless the snowflake team provided a committed-to-date to support the new version.

rafaelclp avatar Apr 18 '24 15:04 rafaelclp

@rafaelclp then what would you recommend? We could also consider installing Snowflake from that feature branch for the short term hoping they'll eventually fix it?

gventuri avatar Apr 19 '24 07:04 gventuri

@ArslanSaleem is only Snowflake blocking the upgrade as far as you know?

gventuri avatar Apr 19 '24 07:04 gventuri

@rafaelclp then what would you recommend? We could also consider installing Snowflake from that feature branch for the short term hoping they'll eventually fix it?

This is just my opinion, so hardly a recommendation. My reasoning is that people using that connector are already stuck with older pandas versions and a 1 year old sqlalchemy version, so I don't see why they couldn't afford to be stuck with older pandas-ai versions as well. That's all. Edit: if someone absolutely must use the latest version of pandas-ai, I think your own suggestion is the best option indeed, just install snowflake-sqlalchemy from the PR that adds support for sqlalchemy 2.0 while waiting for it to be merged.

In any case, I was wrong in betting against support for sqlalchemy 2 in snowflake-sqlalchemy by the end of April: https://github.com/snowflakedb/snowflake-sqlalchemy/pull/469. It's almost here!

rafaelclp avatar Apr 22 '24 14:04 rafaelclp

Guys, latest snowflake-sqlalchemy finally supports sqlalchemy>2 (link). Is everyone okay if I open a PR upgrading pandas?

YarShev avatar Jul 09 '24 17:07 YarShev

@YarShev sure, would be great. Let's try to upgrade pandas and sqlalchemy, so we can merge it soon! Thanks a lot @YarShev!!

gventuri avatar Jul 09 '24 18:07 gventuri

@gventuri, it seems sqlalchemy-databricks depends on SQLAlchemy (>=1,<2). When making the following changes in pyproject.toml

diff --git a/pyproject.toml b/pyproject.toml
index c94e4c3..0c3dffc 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
 [tool.poetry.dependencies]
 python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
 python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
 astor = "^0.8.1"
 openai = "<2"
 matplotlib = "^3.7.1"
 pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
 duckdb = "<1"
 faker = "^19.12.0"
 pillow = "^10.1.0"
 requests = "^2.31.0"
 jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
 beautifulsoup4 = {version="^4.12.2", optional = true}
 google-generativeai = {version = "^0.3.2", optional = true}
 google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -42,7 +42,7 @@ pymysql = { version = "^1.1.0", optional = true }
 psycopg2-binary = { version = "^2.9.7", optional = true }
 yfinance = { version = "^0.2.28", optional = true }
 sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true }

I get the following error.

Because sqlalchemy-databricks (0.2.0) depends on SQLAlchemy (>=1,<2)
 and no versions of sqlalchemy-databricks match >0.2.0,<0.3.0, sqlalchemy-databricks (>=0.2.0,<0.3.0) requires SQLAlchemy (>=1,<2).
So, because pandasai depends on both sqlalchemy (>=2.0,<3) and sqlalchemy-databricks (^0.2.0), version solving failed.

What can we do in this case?

YarShev avatar Jul 09 '24 18:07 YarShev

@YarShev I guess the package is deprecated, this one (https://github.com/databricks/databricks-sql-python) is the one officially maintained and supports sqlalchemy >2.0.

We should probably figure out how hard it would be to migrate to the new one. I had a quick look at it and seems a quite straightforward migration! Is it the only blocker?

gventuri avatar Jul 09 '24 19:07 gventuri

I tried to replace sqlalchemy-databricks to databricks-sql-python as follows

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
 [tool.poetry.dependencies]
 python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
 python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
 astor = "^0.8.1"
 openai = "<2"
 matplotlib = "^3.7.1"
 pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
 duckdb = "<1"
 faker = "^19.12.0"
 pillow = "^10.1.0"
 requests = "^2.31.0"
 jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
 beautifulsoup4 = {version="^4.12.2", optional = true}
 google-generativeai = {version = "^0.3.2", optional = true}
 google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -41,8 +41,8 @@ openpyxl = { version = "^3.0.7", optional = true }
 pymysql = { version = "^1.1.0", optional = true }
 psycopg2-binary = { version = "^2.9.7", optional = true }
 yfinance = { version = "^0.2.28", optional = true }
-sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+databricks-sql-python = { version = "3.2.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true }
 flask = { version = "^3.0.2", optional = true }
 sqlalchemy-cockroachdb = { version = "^2.0.2", optional = true }
 sqlalchemy-bigquery = {version = "^1.8.0", optional = true, markers = "python_version >= '3.8' and python_version < '3.13'"}
@@ -71,7 +71,7 @@ sourcery = "^1.11.0"


 [tool.poetry.extras]
-connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "sqlalchemy-databricks", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"]
+connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "databricks-sql-python", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"]

but got this error.

Because pandasai depends on databricks-sql-python (^3.2.0) which doesn't match any versions, version solving failed.

Do you have any insights on this? databricks-sql-python 3.2.0 is available on PyPI though.

YarShev avatar Jul 09 '24 20:07 YarShev

My bad :) The package on pypi is databricks-sql-connector but not databricks-sql-python. I was able to generate a new lock file but pandas is not still a latest one. There are probably some other cross-dependencies that have effect. I opened #1272, let's proceed there.

YarShev avatar Jul 10 '24 08:07 YarShev

@gventuri, is this issue planned to be fixed?

YarShev avatar Oct 20 '24 15:10 YarShev