dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[ENH] Redefining core dependencies, improving documentation on non-core dependencies

Open charlesbluca opened this issue 3 years ago • 0 comments

Recently, there's been some thought into what dependencies Dask-SQL should be packaged with or not - there are a lot of packages included included in the standard conda package that will likely go unused by a majority of users (e.g. fastapi and uvicorn, which are only used when running a SQL server), while in contrast packages that would likely see more frequent use are not included (e.g. dask-ml and scikit-learn, which are used for ML queries).

I think that it might be worth revisiting what packages need to be dependencies (i.e. to create a context and perform basic queries), and then from there decide how we should document the need for additional packages - I can think of a few different ways this could be done, either through explicitly through improved documentation:

  • include warnings in the docs when specific functionality is tied to packages not included in the base install (i.e. "to run a SQL server, fastapi and uvicorn are required")
  • include a installation selector in the docs to direct users to the correct Conda installation command for their desired usage (i.e. a "dask-sql + server" option that would direct users to run conda install dask-sql fastapi uvicorn)

Or implicitly through changes to our packaging setup:

  • creating a dask-sql metapackage that includes all or most of the non-core dependencies, with dask-sql-core including only the core packages
    • if we wanted, this could be more granular by making several dask-sql-* metapackages for different functionality (i.e. dask-sql-server would install dask-sql, fastapi, and uvicorn)
  • if a non-core package is used frequently enough, adding it to the core dependencies so that it is always installed with Dask-SQL

Since packaging changes require a decent amount more work and user feedback, I think a good strategy is to improve the documentation first, and wait to see if that is sufficient before making larger changes to the package. Still, I think it would be useful to identify what packages are core vs. non-core dependencies for Dask-SQL, and maybe explicitly define this through the extras_require section in our setup file.

cc @ayushdg @VibhuJawa @randerzander

charlesbluca avatar Mar 21 '22 14:03 charlesbluca