dask-image Feature request: pyramid_gaussian() ported from scikit-image

dask-image version: 0.2.0
Python version: 3.6.8
Operating System: macOS 10.15.5 Beta

Description

Right now I have a standard 2-D image, stored as a dask array. It's very large, since it's loaded from an SVS file, and I'd like to view it in napari. Since it's so large, I'd like to take advantage of napari's pyramid layers, which means I need to convert the array to a pyramidal form. skimage.transform.pyramid_gaussian can do this, but it eats up so much of my memory that the script gets SIGKILLed.

What I Did

with napari.gui_qt():
    da_img = # very large 2-D dask array
    napari.view_image(list(skimage.transform.pyramid_gaussian(da_img, multichannel=True)), name='image', is_pyramid=True)

and the script uses up all my memory and eventually gets SIGKILLed.

This might be because skimage.transform.pyramid_gaussian calls np.asarray. https://github.com/scikit-image/scikit-image/blob/ad15736162a216e46834cb4722914bfa01aeb3ae/skimage/transform/pyramids.py#L145-L224

Note that this is a feature request and not a bug report.

May 04 '20 02:05 sumanthratna

Thanks for the issue! We've coincidentally been discussing this in napari. As a stopgap, note that dask arrays have a .coarsen method that can be used to create pyramids:

https://docs.dask.org/en/latest/array-api.html#dask.array.coarsen

It won't look as nice as a Gaussian pyramid, but it'll get the job done.

I'll also note that you should probably save your pyramid (e.g. with to_zarr) and then load it, as .coarsen will still need to go through your entire dataset before computing the coarsened array, which will be slow.

Finally, I think it should not be too hard to port the skimage pyramid_gaussian to dask_image, since the raw ingredient of gaussian filtering is already available.

May 04 '20 09:05 jni

Thanks for the reply, @jni! Sorry, I forgot to mention that I knew about dask.array.coarsen. It doesn't seem too obvious how to make coarsen do what I want—I think np.mean should work, but what do I pass for axes?

And yes, saving the pyramid to zarr is definitely better but for user convenience, I'd like to be able to compute the pyramid.

Ultimately, I think either dask_image or napari should add functionality for this; I created this issue here instead of in napari because this might be helpful to users who aren't using napari but are using dask_image.

May 04 '20 13:05 sumanthratna

@sumanthratna the axes argument is a dictionary of {axis_index: downsampling_factor}, e.g {0: 2, 1: 2} to downsample by 2 along the 0th and 1st axes, does that make sense?

Jun 12 '20 14:06 d-v-b

Just bumping this. This would be quite a useful dask image funcrionality to have when generating multiscale .ome.zarr/ngff files from very large arrays.

Jun 30 '21 11:06 VolkerH

Here's a rough sketch. We could do this by wrapping the same generator pattern around dask.array.blockwise

Example

import dask.array as da
import numpy as np
from skimage.transform import pyramid_reduce

def dask_pyramid_gaussian(image, max_layer=-1, downscale=2, sigma=None, order=1,
                        mode='reflect', cval=0, multichannel=False,
                        preserve_range=False, *, channel_axis=-1, **kwargs):
    layer = 0
    current_shape = image.shape

    prev_layer_image = image
    yield image

    # build downsampled images until max_layer is reached or downscale process
    # does not change image size
    while layer != max_layer:
        layer += 1

        layer_image = da.blockwise(pyramid_reduce, 'ij', prev_layer_image, 'ij',
                                   downscale=downscale, sigma=sigma, order=order, mode=mode, cval=cval,
                                   dtype=float, adjust_chunks={'i': lambda n: n / downscale, 'j': lambda n: n / downscale})

        prev_shape = np.asarray(prev_layer_image.shape)
        prev_layer_image = layer_image
        current_shape = np.asarray(layer_image.shape)

        # no change to previous pyramid layer
        if np.all(current_shape == prev_shape):
            break

        yield layer_image

And here's a super tiny example of it in action. We could also use some of the Aperio example data to test it.

from skimage.data import camera

arr = da.from_array(camera(), chunks=(128, 128))
out = list(dask_pyramid_gaussian(arr))
out

[dask.array<array, shape=(512, 512), dtype=uint8, chunksize=(128, 128), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(256, 256), dtype=float64, chunksize=(64, 64), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(128, 128), dtype=float64, chunksize=(32, 32), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(64, 64), dtype=float64, chunksize=(16, 16), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(32, 32), dtype=float64, chunksize=(8, 8), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(16, 16), dtype=float64, chunksize=(4, 4), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(8, 8), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(4, 4), dtype=float64, chunksize=(1, 1), chunktype=numpy.ndarray>,
 dask.array<pyramid_reduce, shape=(0, 0), dtype=float64, chunksize=(0, 0), chunktype=numpy.ndarray>]

Limitations

There are a few key limitations here

it's not n-dimensional (currently a 2D greyscale example)
still needs multichannel/channel_axis support, too
the smallest layer shape and chunk size is being reported as (0, 0), which is not correct (this happens when adjust_chunks tries to downsample an array that already has the smallest chunksize possible: (1,1).
performance considerations. One thing we may need to change is that as we reduce the image down, we might also want to have fewer chunks for those slices too. Lots of tiny chunks can impact performance (but also, so will a lot of rechunking calls). So we'll need to be thoughtful and pick some good defaults.

Questions

We'll need to test how good the baseline performance is with napari. Perhaps @sumanthratna or @jni could try this out.

Note: even if the performance is sluggish, there might still be a place for this. It'd be useful in situations where you have data, but don't know which files are the interesting ones you plan to analyze later (so you don't want to go to all the effort of processing ALL the files into a pyramidal format). Also, we can make sure it's integrated well with workflows to save out a pyramidal file to disk (then you can do the initial exploration, decide it's worthwhile saving as a pyramid for better performance, and easily continue with your work)

Jul 02 '21 11:07 GenevieveBuckley

I've been using dask for multiresolution pyramids with my own tiny library (xarray-multiscale) for a few months. At the moment I'm using dask.array.coarsen, so I don't do gaussian downsampling, but I would be happy to add a dependency on dask-image if the gaussian pyramid support gets added.

A few notes from my own experience:

In practice, you really want to use xarray.DataArrays to represent the different levels of the pyramid. In terms of transformations, downscaling applies a scaling AND a small translation to the bounding box of the data. You cannot track this explicitly with a raw dask / numpy array, which can lead to a bad time.
You might want some flexibility in how you schedule the computation. For linear downscaling methods like gaussian smoothing, you can compute layer K as a function of layer K-1. But for nonlinear downscaling methods like the windowed mode (which you would use for downscaling labelled images, where averaging is meaningless) you must compute layer K as a function of layer 0.
The ideal chunking for the different levels is a constant issue, and it really depends on what kind of storage you want to write to.

Jul 02 '21 14:07 d-v-b

At the moment I'm using dask.array.coarsen, so I don't do gaussian downsampling, but I would be happy to add a dependency on dask-image if the gaussian pyramid support gets added.

What would you need on your end? I'm assuming you'd want support for pyramid_reduce/pyramid_expend in dask-image, anything else important?

It's a good point about keeping track of change in pixel locations.

Jul 05 '21 09:07 GenevieveBuckley

In my application I'm only ever downscaling, never upscaling, so I would use pyramid_reduce if it was available.

Jul 06 '21 17:07 d-v-b

Hi @GenevieveBuckley and @d-v-b, thanks for your suggestions. I haven't been able to try this out yet, but I second @d-v-b, I would only use this for downscaling.

In fact, looking at the skimage source code for pyramid_gaussian, I find it overly complicated for my use case.

I would probably be happy with subsampling after smoothing, i.e. applying a spatial low-pass filter (so typically gaussian) to prevent aliasing and then sampling at every second array position. I don't care too much about rounding-errors for error sizes in lower pyramid levels.

Not sure what to do about chunk sizes for different pyramid levels. They may even remain chunks of fixed size.

The use case here is to create a multi-resolution .ome.zarr file for viewing whole microscopy slide scans efficiently in napari.

Jul 07 '21 21:07 VolkerH

Also interested in these features, and happy to help getting them implemented.

Our use-case is constructing OME-Zarr pyramids from any given high resolution (multichannel) image to easily visualize them using tools like napari, Viv or Vizarr. In OME-Zarr, chunks are typically the same size across the entire pyramid though I don't think this is strictly required by the specification.

Similar to the others, we would use these features exclusively for downscaling, though not necessarily using Gaussian smoothing. Performance is of secondary concern for us at the moment since this will be used in an offline context.

Jul 29 '21 05:07 claesenm

There is a tangentially related discussion about down-sampling happening here (might be of interest to some people in this thread, for instance @VolkerH)

Jul 30 '21 07:07 GenevieveBuckley

Hi,

can't join the discussion on twitter as I don't want to create an account with them. Striding by itself [::2] (a.k.a) subsampling without prior low-pass filter leads to aliasing artifacts.

As we don't need arbitrary downscaling factors, we resorted to using Gaussian smoothing using dask image followed by subsampling which works well. Implementing the whole pyramid functionality from skimage turned out to be overkill for this process.

A collaegue in my team meanwhile implemented an .ome.zarr write function that takes a 5D dask array tczyx as input and outputs a pyramid using said downscaling approach. I hope this can be open-sourced, but this needs some more discussions in our team, so I can't give a time-line.

Jul 30 '21 08:07 VolkerH

dask-image dask-image copied to clipboard

Feature request: pyramid_gaussian() ported from scikit-image

Description

What I Did

Example

Limitations

Questions

dask-image
dask-image copied to clipboard