Separate standard and expert dependencies (pip install xdem[expert])
Context:
Currently, installing xdem installs all dependencies, including those only required for advanced or expert-level use. This can unnecessarily increase the installation time and complexity for users who only need basic functionality or prefer a lightweight setup.
-
pip install xdem→ Minimal installation (light mode) -
pip install xdem[expert]→ Full installation with additional expert dependencies
Tasks:
-
Identify the dependencies required for basic (light) usage.
-
[ ] List the "expert" dependencies
-
[ ] Modify the
setup.pyfile to includeextras_require:Example:
extras_require={ "expert": [ pytransform3d scikit-learn scikit-gstat pyyaml ... ] } -
[ ] Update the documentation
-
[ ] Add a test in the CI to ensure both installation modes work correctly.
-
OK, minor changes in the end! We already talked about optional dependencies on Slack: I only see Matplotlib as a potential addition to optional dependencies, otherwise they are all already covered.
Importantly, we should amend the description above: "Currently, installing xdem installs all dependencies, including those only required for advanced or expert-level use.". This is wrong, it's the opposite. Currently, we install a "light" mode of the package: all optional dependencies have to be installed separately by the users.
So in short: Solving this issue would allow installing optional dependencies directly from pip for expert users (and potentially remove Matplotlib as a core dependency). The rest should be already covered :wink:.
And just to re-explain the logic here for the record (as we'll lose the Slack conv eventually), and make the link to GeoUtils (where most of the changes will need to happen).
xDEM core dependencies that are not inter-linked are:
- GeoUtils,
- TQDM,
- Numba.
Normally that's it, if I'm not mistaken. Everything else listed in environment.yml is already a dependency of one of these packages.
Then GeoUtils core dependencies are:
- Rioxarray,
- GeoPandas,
- PyProj,
- SciPy,
- Matplotlib,
- Dask.
Same here, that's it. Everything else listed in environment.yml is linked to those. Rioxarray will become a core dependency with the Xarray accessor soon (https://github.com/GlacioHack/geoutils/pull/446).
If we want to use the same logic as in Xarray for optional dependencies (here: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html#optional-dependencies), we could move "Matplotlib" (as in previous comment) and also "Dask" (in GeoUtils).
But I think we should keep SciPy as a core dependency (it's much more integral to several of our core functionalities than in Xarray where it's just optional engine for netCDF reading and the interp() function).
I would not make it too complex (xarray has 4 different subpackages of dependencies...) and keep only 2 levels of install. For example, it makes sense for xarray to have matplotlib in the viz package because they have other less common dependencies like cartopy. But I would not make Matplotlib optional in xDEM. It is very standard and it will soon be needed for all workflows for example. But Dask is heavy so I would make it optional. I would also make rioxarray optional as it is needed only for those using the xarray accessor, which is not all users.
Yes, I think two levels is perfect.
But I would definitely have Matplotlib optional (which is actually quite heavy), given that the focus of our package is not plotting at all. It's just the one function plot().
And I think Rioxarray will have to become a main dependency eventually (as it relates to all functionalities of the package, a bit like Rasterio for the Raster class). Even if it doesn't appear to be important now, I'd say that more than 50% of our current users are just waiting to switch to the accessor the moment it'll become available, and new users that will be using Xarray most of the time. So within a year or two, the Xarray/Pandas accessor will likely be the primary data object, not the Raster/Vector classes.
regarding matplotlib, as I said, it will be needed for all workflow that have default output figures. So I would still argue not make it optional.
I thought we already discussed that xDEM workflows would only be run with optional dependencies anyway? @adebardo @belletva Because we don't want to impose on all users the dependencies specific to YAML reading/checking/HTML creation and PDF rendering (Cerberus, Pyyaml and Weasyprint). Then, Matplotlib would also not be needed as a main dependency.
In general, I think we should follow the good practices of bigger packages like Xarray/Pandas/etc here. It's very easy for a user to add matplotlib in their environment file, or specify xdem[all] there. But it's difficult to get rid of matplotlib if they don't need it and we impose it anyway, although it's only used in the single function plot().
I'll link in https://github.com/GlacioHack/xdem/pull/767.