satpy
satpy copied to clipboard
SEVIRI hrit reader could be using dask and memmap for the metadata
Feature Request
The SEVIRI HRIT reader is now reading the metadata from the prologue and epilogue files preemptively with numpy's fromfile
, and then reduces the memory footprint by removing every array bigger than 100 elements. We could instead use a memory map and dask to have the data opened lazily.
An example of how this could be implemented:
data = np.memmap(fp_, dtype=hrit_prologue, shape=1, offset=self.mda['total_header_length'], mode="readonly")
The recarray2dict
function would need to be adjusted though to work with dask arrays.
Would that have any effect on the other readers that use recarray2dict
? I think the Electro-L reader does, for example...
It might indeed. The best would probably to implement this change in all hrit readers at once
Just curious, does memmap
accept a file pointer/handle or filename string? Does it change how the low-level handles it? What about how dask sees it when you pass it to from_array
? I'm thinking in the long term we may want to use more filename based stuff, but I think as-is dask will just send the data to any distributed workers (threaded workers probably aren't a problem).
Comment -- as I was checking the bunzip2 case. Such a proposal requires the file to remain open, which would not be possible with compressed (bz2 for instance) files on disk.