Dieter Weber
Dieter Weber
> He was in favor of storing array data in raw binary form and the metadata in a separate file or simple header; I would be curious about his opinion...
> to be honest I've never actually experienced significant slowdowns due to rechunking, only by trying to do operations on inappropriately chunked data. On the threaded executor rechunking ist quite...
> Since HDF5 can chunk in any way and applications may have other constraints on access pattern and chunk size, this makes code that reads HDF5 efficiently rather complex. ...and...
@din14970 Oh, that is neat as well! I didn't know one could "dispatch" data to different categories like this in a `dtype` definition. Interesting! If this method or the folding...
@CSSFrancis regarding your nice summary: An additional point could be for compressed chunked data in combination with Dask arrays that the chunks in the dataset shouldn't necessarily correspond to chunks...
In the context of the mmap example, the delayed function that creates the array chunks would return the lazy wrapper / proxy instead of slicing the HDF5 file immediately. Dask...
Here is a notebook that compares the "lazy HDF5 reader" method with normal loading from Hyperspy. Summary of results: Small chunks in the HDF5 file, larger Dask array chunks and...
If this is of interest, I could prepare a pull request to use this method for opening HDF5 files instead of the current implementation.
FYI, today I worked a bit more on the code. It is now faster than the current HDF5 loading in Hyperspy under almost all circumstances. It is generally insensitive to...
@magnunor thank you for testing it, interesting results! I've tried it out and found two items that improved performance dramatically: * Start with a single chunk Dask array from `hdf5_dask_array()`...