dask-image
dask-image copied to clipboard
dask_image.imread.imread regression
It appears that the more recent upgrades to dask-image's imread have broken the use case of reading multiple 3D tiffs. (Probably the problem is more general than that, but this is my test case.) To reproduce:
import numpy as np
from dask_image.imread import imread
import tifffile
import tempfile
from skimage import data
blobs = data.binary_blobs(64, n_dim=3)
with tempfile.TemporaryDirectory() as tmpdir:
for i in range(5):
tifffile.imsave(os.path.join(tmpdir, f'{i}.tiff'), blobs)
image = imread(os.path.join(tmpdir, '*.tiff'))
print(image)
timepoint = np.asarray(image[0])
Produces the following print output:
dask.array<_map_read_frame, shape=(5, 64, 64, 64), dtype=bool, chunksize=(1, 64, 64, 64), chunktype=numpy.ndarray>
And this traceback:
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-7-7f255eff488b> in <module>
4 image = imread(os.path.join(tmpdir, '*.tiff'))
5 print(image)
----> 6 timepoint = np.asarray(image[0])
7
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=like)
101
--> 102 return array(a, dtype, copy=False, order=order)
103
104
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
1474
1475 def __array__(self, dtype=None, **kwargs):
-> 1476 x = self.compute()
1477 if dtype and x.dtype != dtype:
1478 x = x.astype(dtype)
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
283 dask.base.compute
284 """
--> 285 (result,) = compute(self, traverse=False, **kwargs)
286 return result
287
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
565 postcomputes.append(x.__dask_postcompute__())
566
--> 567 results = schedule(dsk, keys, **kwargs)
568 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
569
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
77 pool = MultiprocessingPoolExecutor(pool)
78
---> 79 results = get_async(
80 pool.submit,
81 pool._max_workers,
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
512 _execute_task(task, data) # Re-execute locally
513 else:
--> 514 raise_exception(exc, tb)
515 res, worker_id = loads(res_info)
516 state["cache"][key] = res
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
323 if exc.__traceback__ is not tb:
324 raise exc.with_traceback(tb)
--> 325 raise exc
326
327
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
221 try:
222 task, data = loads(task_info)
--> 223 result = _execute_task(task, data)
224 id = get_id()
225 result = dumps((result, id))
~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
This works fine with dask-image 0.5.0.
Environment:
- Dask version: 2021.04.1
- Python version: 3.9.4
- Operating System: Linux
- Install method (conda, pip, source): pip
CC @DragaDoncila, who first discovered the bug.
Thanks for the report!
What happens if you use dask.array.image.imread(), rather than the dask-image imread? https://docs.dask.org/en/latest/array-api.html?highlight=images#image-support
(EDITED: corrected the import path)
do I need to install something else to get that? I don't have an image module in my dask.
Found it — had to explicitly import dask.array.image.
That works, here's the output:
In [13]: import dask.array.image
In [14]: with tempfile.TemporaryDirectory() as tmpdir:
...: for i in range(5):
...: tifffile.imsave(os.path.join(tmpdir, f'{i}.tiff'), blobs)
...: image = dask.array.image.imread(os.path.join(tmpdir, '*.tiff'))
...: print(image)
...: timepoint = np.asarray(image[0])
...:
dask.array<imread, shape=(5, 64, 64, 64), dtype=bool, chunksize=(1, 64, 64, 64), chunktype=numpy.ndarray>
In [15]: timepoint.shape
Out[15]: (64, 64, 64)
We really need to benchmark all these imread variations.
Would you be able to share some of your 3D tiffs @jni? Ideally the benchmarks should be close to real-world use cases.
There is a duplicate report here: https://github.com/dask/dask-image/issues/218 I'll close that, and keep the discussion in this thread.
This is worth documenting, at least. Raised here: https://forum.image.sc/t/dask-array-strange-behaviour-for-label-images/52666
I wanted to draw your attention to the imread function issue while trying to read a label image. The usual tiffile imread function reads the dimensions correctly but the dask imread function totally misses the time and Z dimensions and seems to only read the X and Y properly. For a float type image it reads it properly
Reply:
So, dask-image calls out to the pims library and pims needs a bundle_axes keyword argument to properly show more than just the xy dimensions. http://soft-matter.github.io/pims/v0.5/multidimensional.html The first thing I’d try is to pass in a bundle_axes keyword argument through your dask-image imread call and see if that works. The second easiest option might be to try the imread function that lives in dask itself (not dask-image): dask.image.imread() API — Dask documentation (this function pre-dates the dask-image library. By default it will use the scikit-image imread function rather than pims, but you can also pass in your own desired imread function - like tifffile.imread)
Just stumbled across this as well (and only thought of looking at existing issues after fumbling around for an hour). Is there an easy workaround for reading 3D data other than creating my array from dask.delayed?
Just adding the minimal example I created (not much different from @jni 's example) but in my case it is multi-channel data that is causing the same problem:
https://gist.github.com/VolkerH/d0286d606c2850be7d220642a2842806
When you change the shape to (2060,2048) it works as expects.
Ok, for multi-channel/volumetric files I am now using the following workaround. Most likely comes with a performance penalty.
files = pathlib.Path(".").glob("*.tif")
lazy_array = da.stack(map(dask_image.imread.imread, files))
Ii believe the current workarounds were:
- Revert to an earlier version of dask-image, or
- Try
from dask.array.image import imread
It's on my to-do list to benchmark all these variations and fix the regression. (No timeframe on that, though)
More discussion at https://github.com/dask/dask/pull/8385
@VolkerH to get over the performance issue with da.stack, here's my example for loading many tiffs using map_blocks:
https://github.com/jni/napari-demos/blob/6860b1fe86a51b30874c34150ae216f4c39b2dd6/rootomics.py#L20-L54