satpy icon indicating copy to clipboard operation
satpy copied to clipboard

SEVIRI hrit reader could be using dask and memmap for the metadata

Open mraspaud opened this issue 3 years ago • 4 comments

Feature Request

The SEVIRI HRIT reader is now reading the metadata from the prologue and epilogue files preemptively with numpy's fromfile, and then reduces the memory footprint by removing every array bigger than 100 elements. We could instead use a memory map and dask to have the data opened lazily.

An example of how this could be implemented:

data = np.memmap(fp_, dtype=hrit_prologue, shape=1, offset=self.mda['total_header_length'], mode="readonly")

The recarray2dict function would need to be adjusted though to work with dask arrays.

mraspaud avatar Dec 22 '21 08:12 mraspaud

Would that have any effect on the other readers that use recarray2dict? I think the Electro-L reader does, for example...

simonrp84 avatar Dec 22 '21 09:12 simonrp84

It might indeed. The best would probably to implement this change in all hrit readers at once

mraspaud avatar Dec 22 '21 10:12 mraspaud

Just curious, does memmap accept a file pointer/handle or filename string? Does it change how the low-level handles it? What about how dask sees it when you pass it to from_array? I'm thinking in the long term we may want to use more filename based stuff, but I think as-is dask will just send the data to any distributed workers (threaded workers probably aren't a problem).

djhoese avatar Dec 22 '21 16:12 djhoese

Comment -- as I was checking the bunzip2 case. Such a proposal requires the file to remain open, which would not be possible with compressed (bz2 for instance) files on disk.

pdebuyl avatar Feb 24 '22 09:02 pdebuyl