dask-sql
dask-sql copied to clipboard
[ENH] Redefining core dependencies, improving documentation on non-core dependencies
Recently, there's been some thought into what dependencies Dask-SQL should be packaged with or not - there are a lot of packages included included in the standard conda package that will likely go unused by a majority of users (e.g. fastapi and uvicorn, which are only used when running a SQL server), while in contrast packages that would likely see more frequent use are not included (e.g. dask-ml and scikit-learn, which are used for ML queries).
I think that it might be worth revisiting what packages need to be dependencies (i.e. to create a context and perform basic queries), and then from there decide how we should document the need for additional packages - I can think of a few different ways this could be done, either through explicitly through improved documentation:
- include warnings in the docs when specific functionality is tied to packages not included in the base install (i.e. "to run a SQL server,
fastapianduvicornare required") - include a installation selector in the docs to direct users to the correct Conda installation command for their desired usage (i.e. a "dask-sql + server" option that would direct users to run
conda install dask-sql fastapi uvicorn)
Or implicitly through changes to our packaging setup:
- creating a
dask-sqlmetapackage that includes all or most of the non-core dependencies, withdask-sql-coreincluding only the core packages- if we wanted, this could be more granular by making several
dask-sql-*metapackages for different functionality (i.e.dask-sql-serverwould installdask-sql,fastapi, anduvicorn)
- if we wanted, this could be more granular by making several
- if a non-core package is used frequently enough, adding it to the core dependencies so that it is always installed with Dask-SQL
Since packaging changes require a decent amount more work and user feedback, I think a good strategy is to improve the documentation first, and wait to see if that is sufficient before making larger changes to the package. Still, I think it would be useful to identify what packages are core vs. non-core dependencies for Dask-SQL, and maybe explicitly define this through the extras_require section in our setup file.
cc @ayushdg @VibhuJawa @randerzander