Treat pandas as an optional dependency (#489)
What type of PR is this?
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [X] Other
Description
This naive PR changes pandas to be an optional dependency - just like pyarrow. Tests fail but seemingly due to unrelated things.
How is this tested?
- [X] Unit tests
- [X] E2E Tests
- [ ] Manually
- [ ] N/A
Related Tickets & Documents
https://github.com/databricks/databricks-sql-python/issues/489
Any chance to get this reviewed? @deeksha-db @samikshya-db @jprakash-db @yunbodeng-db @jackyhu-db @benc-db
Hey @gs11 thanks for addressing this, I just realized about your PR.
To close #489 you should make sure to avoid importing pandas as top level import in src/databricks/sql/client.py. That would have been my strategy.
I really hope this gets through eventually
To close #489 you should make sure to avoid importing pandas as top level import in
src/databricks/sql/client.py. That would have been my strategy.
Thanks! @FBruzzesi I realize this change wasn't pushed properly to my branch. Updated it as well as rebased to resolve the poetry.lock conflict issues due to upstream being updated.
Hello, I'm just wondering if it's possible to get any movement on this? The large pandas dependency is a blocker for my project.
Any chance of getting this reviewed? @deeksha-db @samikshya-db @jprakash-db @yunbodeng-db @jackyhu-db @benc-db
Is there any way to accelerate this PR? The underlaying issue was opened almost a year ago, and this PR nearing half a year and the issue is still quite blocking for being able to use databricks-sql-python at all.
If there is any way for me to contribute, lmk.
@gs11 it appears that there are conflicts. If I had a guess, it would be the lock file needs to be regenerated.
Pandas is also imported here: https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/result_set.py#L7
I imagine they would want a warning message similar to the one for pyarrow: https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/client.py#L81-L86
Pandas is also imported here:
https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/result_set.py#L7
I imagine they would want a warning message similar to the one for pyarrow:
https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/src/databricks/sql/client.py#L81-L86
Thanks, updated.
Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.
Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.
I can't comment on the effectiveness but there's https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/tests/unit/test_client.py#L482
Are there any tests in the CI run without pandas installed? If not, it would be important to add for future stability.
I can't comment on the effectiveness but there's
https://github.com/databricks/databricks-sql-python/blob/4b7df5b0fd4da7e9caecbd8042c12e363c6d3d5f/tests/unit/test_client.py#L482
That's helpful. Did you check the CI files to see if there is a test run without optional dependencies installed?
@gs11 Have started the tests to run, will review after the tests have passed
That's helpful. Did you check the CI files to see if there is a test run without optional dependencies installed?
No, there seems to be a single set of dependencies used for tests. https://github.com/databricks/databricks-sql-python/blob/main/.github/workflows/code-coverage.yml#L52-L62
Fixed the linting error but there are plenty of tests failing on unrelated issues 😕
@msrathore-db Can you look at this? Looks like some core Os libraries for kerberos are missing in the actions ubuntu images