Make netCDF4 optional.
Closes #2028.
Codecov Report
Merging #2029 (27720b8) into main (c78deef) will increase coverage by
0.00%. The diff coverage is100.00%.
@@ Coverage Diff @@
## main #2029 +/- ##
=======================================
Coverage 90.64% 90.64%
=======================================
Files 117 117
Lines 12553 12554 +1
=======================================
+ Hits 11379 11380 +1
Misses 1174 1174
| Impacted Files | Coverage Δ | |
|---|---|---|
| arviz/data/inference_data.py | 83.04% <100.00%> (+0.02%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update c78deef...27720b8. Read the comment docs.
This will break a lot of code and environments, we need to be careful about how to do this, I don't think removing the dependency straight away is a good idea.
The whole of ArviZ assumes netcdf files can be stored and read. For example, the doc failures are due to that, using arviz.load_arviz_data reads netcdf files (either packaged with the library or after downloading from figshare) and most documentation examples are based on this. Changing that will also mean that after doing pip install arviz people won't be able to save inferencedata as netcdf as they are now
OK, maybe it should stay a dependency then?
I think it should be possible to install ArviZ without installing libraries that are not required throughout the library (netcdf, but also matplotlib for example like I said before), but I am also worried about making pip install arviz default to an installation that can't save to file nor plot (in the extreme) unless some of the possible optional dependencies are installed manually as well as backward compatibility issues.
Let's see what the rest of the team thinks. One thing we discussed was splitting the library in data/stats/plots but releasing in sync for simplicity and keeping arviz as the bundle of the three (plus maybe netcdf and/or matplotlib). But it might be better to rip the band aid and have arviz make a minimal install and recommend installing arviz[things]
I don't recall if it is possible to do like, pip install arviz[minimal] -- would that help with this? I'm pretty happy to allow bespoke [special_install_requires] if it allows usage.
Catching up. Agree with Oriol, We definitely can't do it this way this "hardline" way its going to break a lot of things. I think we have do it the way Colin recommends, which is netcdf by default, but it can be turned off with a flag.
We should do this to enable support for pyscript, and I think jupyterlite as well since it runs on pyodide. Those would both be awesome tools to allow users to test out arviz or learn statistical computing without needing to install a local env.
Also note for the pyscript usecase we'll also need to compile both wheels everytime we release as well so its fetchable by the pyscript users. Outside of the scope of this PR but something to note
I don't recall if it is possible to do like, pip install arviz[minimal] -- would that help with this?
That would be amazing but I don't think it is possible. afaik the [name] can only be used to add dependencies to the set of requirements. Does someone know how to use this [] syntax or some kind of alternative to allow this?
I'm pretty happy to allow bespoke [special_install_requires] if it allows usage.
Just to make sure we are all on the same page, we already provide arviz[all] to install all requirements and optional dependencies at once. See https://python.arviz.org/en/latest/getting_started/Installation.html
Also note for the pyscript usecase we'll also need to compile both wheels everytime we release as well so its fetchable by the pyscript users. Outside of the scope of this PR but something to note
Whenever we release, the Azure job already builds a wheel and uploads it to pypi, see https://pypi.org/project/arviz/#files. Only one wheel is generated though and you mentioned "both wheels" @canyon289 can you create an issue with what is missing so we remember to update the release Azure job?
Whenever we release, the Azure job already builds a wheel and uploads it to pypi, see https://pypi.org/project/arviz/#files. Only one wheel is generated though and you mentioned "both wheels" @canyon289 can you create an issue with what is missing so we remember to update the release Azure job?
Sure but we only need to do this if we go through with the proposal above which its uncertain if we will
What about using h5netcdf? That might make it a lot less heavy since the netCDF4 C binaries won't be necessary.
Closing in favor of https://github.com/arviz-devs/arviz/pull/2122.