Refactor NetCDF4 filehandler classes
This may be more work than this PR can support right now, but can we:
Split the base class and this new class into 3 total classes. Essentially a "netcdf4 handler" and an "h5netcdf file handler" and then a single class that uses both of these classes as "mixins"? Or maybe just sub-objects? The creation of the object can be determined when the file handler tries to open self.filename. It still has the try/except but instead of calling NetCDF4-python directly it creates an instance of the "netcdf4 handler" class and if exception then it makes the h5netcdf handler class. They would be essentially two classes implementing the file-reading interfaces that the Satpy NetCDF4 utility handler needs to get attributes/data.
Originally posted by @djhoese in https://github.com/pytroll/satpy/pull/2305#discussion_r1068228943
And another comment from @djhoese https://github.com/pytroll/satpy/pull/2305#discussion_r1069570070
The try/except might be the best option. I think we're generally OK with the behavior of "do whatever you have to do to read this file", but we'll have to keep it in mind when debugging things for us and user bug reports ("wait, do you only have h5netcdf installed?").
The only reason not to do the try/except is that NetCDF4-C may be able to read URLs (HTTP for sure, s3:// some times I think) but may not perform as well as h5netcdf depending on what modes/methods are being used to read the data (as a stream of bytes versus HTTP byte range requests). The worst part about this would be that it would be hard to detect because things would still be working, just slower.
I think my overall vote is for merging this stuff into one user-level class (likely constructed from 2 or more other classes) and leaving the try/except.
:thinking: Would it be worth it to make the NetCDF4 and H5NetCDF handlers complete fully-functioning file handler classes and then also the one that shares them?
It would be nice to have some refactoring so that http byte-range/S3 support in netCDF-c can be used. Right now, it's only possible to use h5netcdf for this since file-like objects are being passed around. My current work-around is to subclass FSFile, which creates xarray's own netCDF4DataStore that can be handed to open_dataset():
from satpy.readers import FSFile
from xarray.backends import NetCDF4DataStore
from netCDF4 import Dataset
class NC4Bytes(FSFile):
def open(self, *args, **kwargs):
return NetCDF4DataStore(Dataset(self._file + '#mode=bytes'))
@dopplershift Pre-FSFile I had a PR where I tried to move this forward:
https://github.com/pytroll/satpy/pull/1349
I'm not suggesting my old workaround back then is the best now, but just wanted to point out that it was attempted. I've never had a good understanding (high level or low level) of how the xarray data stores and caches work so I've never had a good idea how it should replace some of the old pre-xarray hacky stuff we have in Satpy.
All of that said, this portion of the refactor may be its own issue. As described in my PR above, the main issues with #mode=bytes style URLs is not the filehandler classes talked about here. It is that the base YAML reader class (if I'm remembering correctly) needs to match file names to file patterns (parse out filename metadata) so the #mode=bytes needs to be removed. This assumes the user is passing the URL that way. Your solution is the opposite. I'm not sure which is best.
This issue that we're commenting in is more about the structure of the custom NetCDF wrapper outside of xarray's open_dataset.