aicsimageio icon indicating copy to clipboard operation
aicsimageio copied to clipboard

Optimization of default chunk_dims if not explicitely provided

Open evamaxfield opened this issue 4 years ago • 4 comments

Use Case

Please provide a use case to help us understand your request in context

Always attempt to make smart choices for the user in terms of which chunk dims to read files with. This will speed up reading and processing greatly if we do it correctly.

Solution

Please describe your ideal solution

Potentially, start with all AICSImage dimensions as chunk dims, check the size of the chunk, if the chunk is larger than some threshold, drop a dimension based off the file's dimensions (i.e., no, or a singleton, Z dimension present then drop Z from the chunk dims), keep iterating until the chunk size is smaller than the threshold.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

evamaxfield avatar Jul 12 '21 22:07 evamaxfield

Thinking: this has major implications for fast rendering and loading in napari as well.

Probably going to bump up priorty on this because its both fun and useful.

evamaxfield avatar Jul 16 '21 16:07 evamaxfield

@toloudis also just had an interesting idea about how to optimize. Thinking out loud:

We construct the dask array chunks on first data access attempt (i.e. get_image_dask_data("ZYX") sets the chunk dims to "ZYX" because that is what was requested).

This would be a change from constructing the dask array on object init. It also further pushes us towards fully delayed dask array construction / xarray construction as we have seen other issues with that.

Init should maybe just be a metadata only read and then the get_image_dask_data call or any getitem on dask_data should construct the dask array.

evamaxfield avatar Sep 24 '21 22:09 evamaxfield

Yup the basic idea is to set up the chunking more lazily based on the first access call that says what data is being requested from the image. That data access pattern (the dims being asked for by the caller) can help inform how to set up the dask chunks.

Just a couple more details / things to consider:

  • is it possible that YX chunks will be more optimal even if ZYX was requested in the get_image_data call? Can we make a good guess internally?
  • can we make a better chunk guess knowing the size of the dimensions and roughly estimating how much memory makes a good chunk size?

toloudis avatar Sep 24 '21 22:09 toloudis

Another idea, is to try to apply map_blocks or test whether that helps gain performance over the current implementation in OmeTiffReader

toloudis avatar Sep 24 '21 22:09 toloudis

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Mar 29 '23 02:03 github-actions[bot]