databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Remove requirement: openpyxl

Open davebelais opened this issue 1 year ago • 5 comments

You include openpyxl as a requirement for this package, however openpyxl is not used by this library, as you can see from this search. Please remove this requirement to reduce bloat in applications/libraries dependent on this package. Thanks!

davebelais avatar Jan 11 '24 21:01 davebelais

This is a good catch. openpyxl is the seventh largest dependency of databricks-sql-connector weighing in at 1.98mb. And we have a big effort underway to reduce the overall installation size. Pull requests will be incoming for this in the next week or so.

openpyxl isn't used by the connector but it is used as part of our e2e test suite. The solution is to simply move it in pyproject.toml so that it's only installed in development mode.

susodapop avatar Jan 12 '24 17:01 susodapop

+1 for this idea. The size of the total install size is very large IMO.

FYI: I'm currently trying to work around the issue where adding databricks-sql-python to a lambda function causes the function size to balloon over the 250Mb limit.

MichaelAnckaert avatar Mar 13 '24 09:03 MichaelAnckaert

+1 Same issue

joeraver avatar Apr 30 '24 14:04 joeraver

@MichaelAnckaert the biggest culprits for install size are pyarrow and numpy. Remove openpyxl makes sense towards the same goal but comprises a small fraction of the total install size.

susodapop avatar Apr 30 '24 14:04 susodapop

+1 same issue

henryhueske avatar Jul 08 '24 08:07 henryhueske