dask-image
dask-image copied to clipboard
bundle computation of sets of ndmeasure functions
If I would like to compute several of the ndmeasures functions over my image data (similar to `scikit-image's regionprops'), is there a way to do that, i.e. to save compute and I/O? I haven't seen this being discussed here or elsewhere. Thank you!
We've discussed in other issues and offline adding regionprops. So thanks for surfacing this as a proper issue. That doesn't exist today, but agree this is very valuable and seems worthwhile to include.
One thing worth noting that may help in the short-term is under-the-hood we use labeled_comprehension to implement these different measurement functions. So this can be a good way to shoehorn in one's own label-based computation.
cc @jni (who also likely has thoughts here 🙂)
I think some of the holdup for a version of regionprops in dask-image was to do with the find objects function.
But as I recall, there are a fair few 'easy' properties we could be supporting, several tricky ones, and a few very hard things to do. We shouldn't let the hard stuff completely block progress on supporting the easier things.
I was surprised I didn't find more notes about those old discussions with @jni in the issue threads, so thank you for making a dedicated issue for this @unidesigner
Thanks for the feedback. When trying to run the a single ndmeasure function, I ran into another issue. I load a large zarr 3d image, and want to process a large subvolume. The processing used up all the memory and was killed by the OS, so I am not sure how to proceed and/or where to ask for help. I did not find documentation on how to limit threads/CPU/memory usage or similar, to control dask-image's parallel operations. (sorry for asking this here off-topic)
Currently the way the ndmeasure functions work is they load all the values for a particular label into a single chunk of memory. So I guess the question is how large are each of your labels?
My entire 3d zarr array is about 120k x 70k x 7k voxels (dask.array.from_zarr), and I slice into a 10k x 10k x 2000 voxels subvolume (regular slicing syntax), so it's quite a lot of task. Individual labels have no more than approx 1000 voxels, but there are many in the subvolumes. Would this explain an out-of-memory behavior?
This blogpost and links in it might give you a few suggestions for looking at where all your memory is going:
- https://blog.dask.org/2021/03/11/dask_memory_usage
- https://github.com/itamarst/dask-memusage/
- https://pythonspeed.com/products/filmemoryprofiler/
Thanks @GenevieveBuckley I will have a look.