Decide, document (and rethink?) optional dependency handling
Currently the approach to handling optional dependencies is to not define codec classes if the dependency is not present (e.g., see https://github.com/zarr-developers/numcodecs/pull/637). This leads to a poor user experience when you want to use an optional codec, but haven't installed the optional dependency: if you try and import the codec, you get an import error without any information that to fix the error you have to install a specific package: https://github.com/zarr-developers/numcodecs/issues/526
I propose that we switch to a model where instead instantiating a codec class where an optional dependency is missing raises a helpful error message.
At some point I will create a PR with a concrete implementation of this change to help see what it would mean, but please share opinions on if there are other/better ways to improve user experience around optional dependency handling.
Pinging @jakirkham to see if you have any thoughts, because I know you've been tidying this up recently.
We have allowed a proliferation of increasingly experimental or custom codecs in Numcodecs. However this has come at substantial maintenance cost as they don't always get updated in a timely fashion or incompatible with Python or NumPy releases. So requires lots of finessing in requirements, warnings/errors, and CI.
We have tried to manage this by making them optional and guarding them in various ways. Though doing this correctly is not always straightforward for other contributors.
Think the answer has to be moving these out into separate repos/libraries. IOW Numcodecs extensions that users can install or not.
I propose that we switch to a model where instead instantiating a codec class where an optional dependency is missing raises a helpful error message.
So I tried this out over at https://github.com/zarr-developers/numcodecs/pull/666, and it didn't work because (at least) ZFPY has default arguments in the class method signatures that require values from the zfpy package.
With the current way codecs are namespaced, ie numcodecs.{CODEC} I don't think there's a nicer way to warn or error to users if a dependency is missing. So I think the choice is:
- Change namespacing so codecs are in
numcodecs.{codec-sub-module}.{CODEC}, forcing users to import the submodule to use it (and removing the imports fromnumcodecs/__init__.py - Keep the status quo where codecs just silently don't exist if a dependency isn't installed
I don't think I have particular strong opinions either way - if I was writing from scratch I'd definitely go for 1), but given the pain of changing namespacing perhaps we should just stick with 2)?
Given that we have a dynamic codec registry, why should users be doing from numcodecs import GZip, instead of something like GZip = numcodecs.get_codec('GZip')?