dask-image icon indicating copy to clipboard operation
dask-image copied to clipboard

dask_image.imread.imread regression

Open jni opened this issue 4 years ago • 10 comments
trafficstars

It appears that the more recent upgrades to dask-image's imread have broken the use case of reading multiple 3D tiffs. (Probably the problem is more general than that, but this is my test case.) To reproduce:

import numpy as np
from dask_image.imread import imread
import tifffile
import tempfile
from skimage import data


blobs = data.binary_blobs(64, n_dim=3)
with tempfile.TemporaryDirectory() as tmpdir:
    for i in range(5):
        tifffile.imsave(os.path.join(tmpdir, f'{i}.tiff'), blobs)
    image = imread(os.path.join(tmpdir, '*.tiff'))
    print(image)
    timepoint = np.asarray(image[0])

Produces the following print output:

dask.array<_map_read_frame, shape=(5, 64, 64, 64), dtype=bool, chunksize=(1, 64, 64, 64), chunktype=numpy.ndarray>

And this traceback:

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-7f255eff488b> in <module>
      4     image = imread(os.path.join(tmpdir, '*.tiff'))
      5     print(image)
----> 6     timepoint = np.asarray(image[0])
      7 

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
   1474 
   1475     def __array__(self, dtype=None, **kwargs):
-> 1476         x = self.compute()
   1477         if dtype and x.dtype != dtype:
   1478             x = x.astype(dtype)

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
    283         dask.base.compute
    284         """
--> 285         (result,) = compute(self, traverse=False, **kwargs)
    286         return result
    287 

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    565         postcomputes.append(x.__dask_postcompute__())
    566 
--> 567     results = schedule(dsk, keys, **kwargs)
    568     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    569 

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    512                             _execute_task(task, data)  # Re-execute locally
    513                         else:
--> 514                             raise_exception(exc, tb)
    515                     res, worker_id = loads(res_info)
    516                     state["cache"][key] = res

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    323     if exc.__traceback__ is not tb:
    324         raise exc.with_traceback(tb)
--> 325     raise exc
    326 
    327 

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    221     try:
    222         task, data = loads(task_info)
--> 223         result = _execute_task(task, data)
    224         id = get_id()
    225         result = dumps((result, id))

~/miniconda3/envs/new-dask-image/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

This works fine with dask-image 0.5.0.

Environment:

  • Dask version: 2021.04.1
  • Python version: 3.9.4
  • Operating System: Linux
  • Install method (conda, pip, source): pip

CC @DragaDoncila, who first discovered the bug.

jni avatar May 12 '21 07:05 jni

Thanks for the report!

What happens if you use dask.array.image.imread(), rather than the dask-image imread? https://docs.dask.org/en/latest/array-api.html?highlight=images#image-support

(EDITED: corrected the import path)

GenevieveBuckley avatar May 12 '21 07:05 GenevieveBuckley

do I need to install something else to get that? I don't have an image module in my dask.

jni avatar May 12 '21 07:05 jni

Found it — had to explicitly import dask.array.image.

That works, here's the output:

In [13]: import dask.array.image

In [14]: with tempfile.TemporaryDirectory() as tmpdir:
    ...:     for i in range(5):
    ...:         tifffile.imsave(os.path.join(tmpdir, f'{i}.tiff'), blobs)
    ...:     image = dask.array.image.imread(os.path.join(tmpdir, '*.tiff'))
    ...:     print(image)
    ...:     timepoint = np.asarray(image[0])
    ...: 
dask.array<imread, shape=(5, 64, 64, 64), dtype=bool, chunksize=(1, 64, 64, 64), chunktype=numpy.ndarray>

In [15]: timepoint.shape
Out[15]: (64, 64, 64)

jni avatar May 12 '21 07:05 jni

We really need to benchmark all these imread variations.

Would you be able to share some of your 3D tiffs @jni? Ideally the benchmarks should be close to real-world use cases.

GenevieveBuckley avatar May 13 '21 06:05 GenevieveBuckley

There is a duplicate report here: https://github.com/dask/dask-image/issues/218 I'll close that, and keep the discussion in this thread.

This is worth documenting, at least. Raised here: https://forum.image.sc/t/dask-array-strange-behaviour-for-label-images/52666

I wanted to draw your attention to the imread function issue while trying to read a label image. The usual tiffile imread function reads the dimensions correctly but the dask imread function totally misses the time and Z dimensions and seems to only read the X and Y properly. For a float type image it reads it properly

Reply:

So, dask-image calls out to the pims library and pims needs a bundle_axes keyword argument to properly show more than just the xy dimensions. http://soft-matter.github.io/pims/v0.5/multidimensional.html The first thing I’d try is to pass in a bundle_axes keyword argument through your dask-image imread call and see if that works. The second easiest option might be to try the imread function that lives in dask itself (not dask-image): dask.image.imread() API — Dask documentation (this function pre-dates the dask-image library. By default it will use the scikit-image imread function rather than pims, but you can also pass in your own desired imread function - like tifffile.imread)

GenevieveBuckley avatar May 17 '21 09:05 GenevieveBuckley

Just stumbled across this as well (and only thought of looking at existing issues after fumbling around for an hour). Is there an easy workaround for reading 3D data other than creating my array from dask.delayed?

Just adding the minimal example I created (not much different from @jni 's example) but in my case it is multi-channel data that is causing the same problem:

https://gist.github.com/VolkerH/d0286d606c2850be7d220642a2842806

When you change the shape to (2060,2048) it works as expects.

VolkerH avatar Nov 08 '21 12:11 VolkerH

Ok, for multi-channel/volumetric files I am now using the following workaround. Most likely comes with a performance penalty.

files = pathlib.Path(".").glob("*.tif")
lazy_array = da.stack(map(dask_image.imread.imread, files))

VolkerH avatar Nov 08 '21 12:11 VolkerH

Ii believe the current workarounds were:

  1. Revert to an earlier version of dask-image, or
  2. Try from dask.array.image import imread

It's on my to-do list to benchmark all these variations and fix the regression. (No timeframe on that, though)

GenevieveBuckley avatar Nov 08 '21 22:11 GenevieveBuckley

More discussion at https://github.com/dask/dask/pull/8385

GenevieveBuckley avatar Nov 16 '21 12:11 GenevieveBuckley

@VolkerH to get over the performance issue with da.stack, here's my example for loading many tiffs using map_blocks:

https://github.com/jni/napari-demos/blob/6860b1fe86a51b30874c34150ae216f4c39b2dd6/rootomics.py#L20-L54

jni avatar Nov 16 '21 12:11 jni