dask-image icon indicating copy to clipboard operation
dask-image copied to clipboard

Add imsave function

Open mrocklin opened this issue 6 years ago • 14 comments

I found myself reaching for an imsave function to compliment imread. Presumably this would have similar semantics, and would effectively map over the skimage.io.imsave function, or something else in pims.

I don't have a concrete need though, this just came up when writing up an example.

mrocklin avatar Mar 30 '19 21:03 mrocklin

@jakirkham mentioned the following in a separate conversation:

Does store or to_zarr not work? This is my sense of what people do today.

Could be, I don't actually do this work. So people today don't write many images out? I would expect that for analysis many people would use something like Zarr or HDF as intermediate formats, but that for long time archives, sharing, or publishing people would still want to save to PNG or TIFF or something.

mrocklin avatar Mar 31 '19 01:03 mrocklin

So people today don't write many images out? I would expect that for analysis many people would use something like Zarr or HDF as intermediate formats, but that for long time archives, sharing, or publishing people would still want to save to PNG or TIFF or something

Microscope recording software definitely writes out many images today. This is used as input for analysis and is also archived for long term storage. This may also be the thing that is shared with others.

What users produce is dependent on their analysis. One use case is to produce Regions of Interest, which could live happily in JSON. Another use case is to do some cleanup on this data and ingest it into some sort of centralized database. Other use cases produce Zarr/N5 files or HDF5 files, which may be shared and used for further analysis or could go into long term storage.

Publication/sharing may mean hosting the data with a web server, which means having a robust database to back it is pretty important. It could also mean generating some figures in a paper, which are likely generated outside of the analysis pipeline altogether.

jakirkham avatar Apr 02 '19 00:04 jakirkham

Just FYI. The Satpy project uses Dask to process satellite imagery in a chunk-based fashion. It allows saving results to disk as a GeoTIFF, PNG etc.

https://github.com/pytroll/satpy

RutgerK avatar Apr 12 '19 07:04 RutgerK

I needed this in my science work and came up with this, based on gufuncs:

import dask.array as da
from skimage.io import imsave

def da_imsave(fnames, arr, compute=False):
    """Write arr to a stack of images assuming
    the last two dimensions of arr as image dimensions.
    
    Parameters
    ----------
    fnames: string
        A formatting string like 'myfile{:02d}.png'
        Should support arr.ndims-2 indices to be formatted
    arr: dask.array
        Array of at least 2 dimensions to be written to disk as images
    compute: Boolean (optional)
        whether to write to disk immediately or return a dask.array of the to be written indices
    
    """
    indices = [da.arange(n, chunks=c) for n,c in zip(arr.shape[:-2], arr.chunksize[:-2])]
    index_array = da.stack(da.meshgrid(*indices,indexing='ij'), axis=-1).rechunk({-1:-1})

    @da.as_gufunc(signature=f"(i,j),({arr.ndim-2})->({arr.ndim-2})", output_dtypes=int, vectorize=True)
    def saveimg(image, index):
        imsave(fnames.format(*index), image.squeeze())
        return index
    
    res = saveimg(arr,index_array)
    if compute == True:
        res.compute()
    else:
        return res

Would it be useful to build into a pull request, either here on in dask/dask? What would still be needed for that?

TAdeJong avatar Aug 06 '19 13:08 TAdeJong

Hi @TAdeJong!

What would be needed is: (a) For us to decide how saving should work in dask-image. This is could be a bit of a bottleneck. (b) To make a saving function that is a little more general than your example above. You have a few assumptions that probably wouldn't work for everybody (eg: that you have a 2D image, that the last two dimensions of the array describe spatial dimensions, etc). Some of this will depend on the result of the discussion in (a).

Re: comments by @mrocklin and @jakirkham : As I see it:

  1. Yes, absolutely people want to write out images, and to a format they are familiar with (like tiff, or similar). I would like us to get something in place for this, I think this group cares about archiving processed data, so prioritizing open and accessible file formats is important, and speed & how compressed the data is on disk are of secondary importance. ~~2. Secondary, people probably also want a compressed way to write out data. Maybe something like zarr makes sense here?~~ * ~~3. Third, there might be people who want to write out to a hierarchical multi-resolution format. We probably don't have the bandwidth for this right now.~~*

I think we should prioritise group 1 ~~with a view to extending to groups 2 (and perhaps 3?) down the track.~~*

*Edit: upon reflection only a small part of this is a plausibly good idea.

GenevieveBuckley avatar Aug 07 '19 09:08 GenevieveBuckley

Hi @GenevieveBuckley , (a): I was comparing to dask/dask/array/image.py and think it would be at least nice to get similar capabilities writing out as reading in. In that sense, I think it might be a good idea to put both reading and writing capabilities in the same place, but beyond that I have no opinion whether this should be in core dask or in dask-image. (b) I agree that color/multichannel support is desirable and is not hard to add in this code (via an explicit switch + guessing based on the last dimension, i.e. if it has length 3 or 4. For images, I think memory layout wise it only makes sense if the last 2 (or 3 in case of RGB(A)) dimensions are the individual images, so I would assume an explicit transpose/swapaxis by the user would be the way to go there, of course in combination with clear documentation/example.

Regarding the compressed way to write out data, I wonder if there are any features that would be needed in addition to what dask.array.to_zarr() offers?

TAdeJong avatar Aug 07 '19 09:08 TAdeJong

I do think there's a place for functionality that saves image files (even if it's a basic functionality) in dask-image. So no replicating functionality that already exists in dask itself (like dask.array.to_zarr()), but we might have something specifically for saving to image specific formats.

When I say "more than just 2D arrays", I don't only mean that sometimes we have colour channels. As a rough guide, I have to think about:

  • spatial dimensions - typically 2 (image areas) or 3 (volumes)
  • colour channels - not just RGB, or even RGBA. There might be an arbitrary number of colour channels, especially for things like microscopy data (where each corresponds to a different laser wavelength), or geospatial or astronomy data (which also have hyperspectral imaging)
  • a time dimension

So we can expect typical data might have anywhere between 2 and 5 dimensions, and there's often a lot of variety in which order we see those dimensions.

GenevieveBuckley avatar Aug 07 '19 09:08 GenevieveBuckley

Has any progress been made on this? I'm working on image processing and one of the smallest image sizes in my current dataset is 81000 by 31000 pixels. There isn't a quick way to save an array of this size as a PNG.

sumanthratna avatar Dec 29 '19 16:12 sumanthratna

Hi @sumanthratna No, there hasn't been any activity on this in the last couple of months.

You could try either adapting TAdeJong's script above for your purposes, or look at the Saalfeld lab's N5 library for reading/writing large arrays. It can write to file in parallel, which might help with speed. Caveats: I haven't used this library myself but just chatted to Stephan about it a few months ago; it's still in the early stages so it might not have the features or documentation you need for your project; and image chunks cannot be larger than 2GB which may or may not work for you. Good luck!

GenevieveBuckley avatar Jan 06 '20 03:01 GenevieveBuckley

Just a note that using to_zarr as in these examples, should also write this out to disk in parallel.

jakirkham avatar Nov 12 '20 23:11 jakirkham

Related discussion: https://github.com/dask/dask/issues/3487

GenevieveBuckley avatar Oct 12 '21 05:10 GenevieveBuckley

Hello there,

first of all I want to thank you all for the library which made my expirement possible. I am quiet familliar with python and would be happy to implement this method. I suppose that it wasn't implemented before because there is some kind of difficulties. May I ask what are the major issues/difficulties?

lrlunin avatar Mar 14 '22 22:03 lrlunin

I think Genevieve's comment above ( https://github.com/dask/dask-image/issues/110#issuecomment-519009634 ) pretty accurately captures the tricky points that would need to be addressed.

jakirkham avatar Mar 15 '22 00:03 jakirkham

I'm really appreciating dask-image so far and an imsave/imwrite to e.g. tiff would make it even better.

khyll avatar Apr 20 '23 09:04 khyll