Dieter Weber comments

Results 225 comments of


                                            Dieter Weber

Saving and Loading Large Datasets

> He was in favor of storing array data in raw binary form and the metadata in a separate file or simple header; I would be curious about his opinion...

Saving and Loading Large Datasets

> to be honest I've never actually experienced significant slowdowns due to rechunking, only by trying to do operations on inappropriately chunked data. On the threaded executor rechunking ist quite...

Saving and Loading Large Datasets

> Since HDF5 can chunk in any way and applications may have other constraints on access pattern and chunk size, this makes code that reads HDF5 efficiently rather complex. ...and...

Saving and Loading Large Datasets

@din14970 Oh, that is neat as well! I didn't know one could "dispatch" data to different categories like this in a `dtype` definition. Interesting! If this method or the folding...

Saving and Loading Large Datasets

@CSSFrancis regarding your nice summary: An additional point could be for compressed chunked data in combination with Dask arrays that the chunks in the dataset shouldn't necessarily correspond to chunks...

Saving and Loading Large Datasets

In the context of the mmap example, the delayed function that creates the array chunks would return the lazy wrapper / proxy instead of slicing the HDF5 file immediately. Dask...

Saving and Loading Large Datasets

Here is a notebook that compares the "lazy HDF5 reader" method with normal loading from Hyperspy. Summary of results: Small chunks in the HDF5 file, larger Dask array chunks and...

Saving and Loading Large Datasets

If this is of interest, I could prepare a pull request to use this method for opening HDF5 files instead of the current implementation.

Saving and Loading Large Datasets

FYI, today I worked a bit more on the code. It is now faster than the current HDF5 loading in Hyperspy under almost all circumstances. It is generally insensitive to...

Saving and Loading Large Datasets

@magnunor thank you for testing it, interesting results! I've tried it out and found two items that improved performance dramatically: * Start with a single chunk Dask array from `hdf5_dask_array()`...