databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Treat pandas as an optional dependency (#489)

Open gs11 opened this issue 11 months ago • 13 comments

What type of PR is this?

  • [ ] Refactor
  • [ ] Feature
  • [ ] Bug Fix
  • [X] Other

Description

This naive PR changes pandas to be an optional dependency - just like pyarrow. Tests fail but seemingly due to unrelated things.

How is this tested?

  • [X] Unit tests
  • [X] E2E Tests
  • [ ] Manually
  • [ ] N/A

Related Tickets & Documents

https://github.com/databricks/databricks-sql-python/issues/489

gs11 avatar Mar 24 '25 15:03 gs11

Any chance to get this reviewed? @deeksha-db @samikshya-db @jprakash-db @yunbodeng-db @jackyhu-db @benc-db

gs11 avatar Apr 11 '25 08:04 gs11

Hey @gs11 thanks for addressing this, I just realized about your PR. To close #489 you should make sure to avoid importing pandas as top level import in src/databricks/sql/client.py. That would have been my strategy.

I really hope this gets through eventually

FBruzzesi avatar Apr 18 '25 16:04 FBruzzesi

To close #489 you should make sure to avoid importing pandas as top level import in src/databricks/sql/client.py. That would have been my strategy.

Thanks! @FBruzzesi I realize this change wasn't pushed properly to my branch. Updated it as well as rebased to resolve the poetry.lock conflict issues due to upstream being updated.

gs11 avatar Apr 18 '25 16:04 gs11

Hello, I'm just wondering if it's possible to get any movement on this? The large pandas dependency is a blocker for my project.

dtroberts avatar Aug 14 '25 18:08 dtroberts

Any chance of getting this reviewed? @deeksha-db @samikshya-db @jprakash-db @yunbodeng-db @jackyhu-db @benc-db

gs11 avatar Sep 15 '25 17:09 gs11

Is there any way to accelerate this PR? The underlaying issue was opened almost a year ago, and this PR nearing half a year and the issue is still quite blocking for being able to use databricks-sql-python at all.

If there is any way for me to contribute, lmk.

ottovis avatar Dec 10 '25 12:12 ottovis

@gs11 it appears that there are conflicts. If I had a guess, it would be the lock file needs to be regenerated.

snowman2 avatar Jan 15 '26 20:01 snowman2

Pandas is also imported here: https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/result_set.py#L7

snowman2 avatar Jan 15 '26 20:01 snowman2

I imagine they would want a warning message similar to the one for pyarrow: https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/client.py#L81-L86

snowman2 avatar Jan 15 '26 20:01 snowman2

Pandas is also imported here:

https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/result_set.py#L7

I imagine they would want a warning message similar to the one for pyarrow:

https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/client.py#L81-L86

Thanks, updated.

gs11 avatar Jan 16 '26 12:01 gs11

Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.

snowman2 avatar Jan 16 '26 16:01 snowman2

Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.

I can't comment on the effectiveness but there's https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/tests/unit/test_client.py#L482

gs11 avatar Jan 16 '26 16:01 gs11

Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.

I can't comment on the effectiveness but there's

https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/tests/unit/test_client.py#L482

That's helpful. Did you check the CI files to see if there is a test run without optional dependencies installed?

snowman2 avatar Jan 16 '26 20:01 snowman2

@gs11 Have started the tests to run, will review after the tests have passed

jprakash-db avatar Jan 17 '26 08:01 jprakash-db

That's helpful. Did you check the CI files to see if there is a test run without optional dependencies installed?

No, there seems to be a single set of dependencies used for tests. https://github.com/databricks/databricks-sql-python/blob/main/.github/workflows/code-coverage.yml#L52-L62

gs11 avatar Jan 20 '26 10:01 gs11

Fixed the linting error but there are plenty of tests failing on unrelated issues 😕

gs11 avatar Jan 20 '26 10:01 gs11

@msrathore-db Can you look at this? Looks like some core Os libraries for kerberos are missing in the actions ubuntu images

jprakash-db avatar Jan 22 '26 04:01 jprakash-db