dask-image icon indicating copy to clipboard operation
dask-image copied to clipboard

bundle computation of sets of ndmeasure functions

Open unidesigner opened this issue 4 years ago • 7 comments

If I would like to compute several of the ndmeasures functions over my image data (similar to `scikit-image's regionprops'), is there a way to do that, i.e. to save compute and I/O? I haven't seen this being discussed here or elsewhere. Thank you!

unidesigner avatar May 20 '21 11:05 unidesigner

We've discussed in other issues and offline adding regionprops. So thanks for surfacing this as a proper issue. That doesn't exist today, but agree this is very valuable and seems worthwhile to include.

One thing worth noting that may help in the short-term is under-the-hood we use labeled_comprehension to implement these different measurement functions. So this can be a good way to shoehorn in one's own label-based computation.

cc @jni (who also likely has thoughts here 🙂)

jakirkham avatar May 20 '21 20:05 jakirkham

I think some of the holdup for a version of regionprops in dask-image was to do with the find objects function.

But as I recall, there are a fair few 'easy' properties we could be supporting, several tricky ones, and a few very hard things to do. We shouldn't let the hard stuff completely block progress on supporting the easier things.

I was surprised I didn't find more notes about those old discussions with @jni in the issue threads, so thank you for making a dedicated issue for this @unidesigner

GenevieveBuckley avatar May 21 '21 05:05 GenevieveBuckley

Thanks for the feedback. When trying to run the a single ndmeasure function, I ran into another issue. I load a large zarr 3d image, and want to process a large subvolume. The processing used up all the memory and was killed by the OS, so I am not sure how to proceed and/or where to ask for help. I did not find documentation on how to limit threads/CPU/memory usage or similar, to control dask-image's parallel operations. (sorry for asking this here off-topic)

unidesigner avatar May 25 '21 08:05 unidesigner

Currently the way the ndmeasure functions work is they load all the values for a particular label into a single chunk of memory. So I guess the question is how large are each of your labels?

jakirkham avatar May 25 '21 21:05 jakirkham

My entire 3d zarr array is about 120k x 70k x 7k voxels (dask.array.from_zarr), and I slice into a 10k x 10k x 2000 voxels subvolume (regular slicing syntax), so it's quite a lot of task. Individual labels have no more than approx 1000 voxels, but there are many in the subvolumes. Would this explain an out-of-memory behavior?

unidesigner avatar Jun 01 '21 09:06 unidesigner

This blogpost and links in it might give you a few suggestions for looking at where all your memory is going:

  • https://blog.dask.org/2021/03/11/dask_memory_usage
  • https://github.com/itamarst/dask-memusage/
  • https://pythonspeed.com/products/filmemoryprofiler/

GenevieveBuckley avatar Jun 02 '21 03:06 GenevieveBuckley

Thanks @GenevieveBuckley I will have a look.

unidesigner avatar Jun 02 '21 08:06 unidesigner