dask-image
dask-image copied to clipboard
Memory error using dask-image ndmeasure.label
I am obtaining an OOM error when using ndmeasure.label with a large array.
Minimal Complete Verifiable Example:
nx = 5120 # things are OK < 2500
arr = da.random.random(size=(nx,nx,nx))
darr_bin = arr > 0.8
# The next line will fail
label_image, num_labels = dask_image.ndmeasure.label(darr_bin)
Note that this problem occurs already in the last line, and not when executing the computation via, e.g., num_labels.compute().
This also means that I have the same problem when using a (large) cluster as the OOM always occurs on node 1.
Environment: [I could reproduce this problem on several machines, below is one particular environment]
- Dask version: 2024.9.1
- Python version: Python 3.12.6
- Operating System: Mac OS 12.2
- Install method (conda, pip, source): conda / mamba
Hey @maxbeegee,
thanks a lot for reporting this and sorry for the late reply here.
I could reproduce the issue. Essentially dask_image.ndmeasure.label applies scipy.ndimage.label to the individual chunks of the input image. It then fuses the obtained labels after obtaining a list of equivalent labels from examining the boundaries of the chunks. The current implementation doesn't work too well with very large number of boundaries to examine.
Until we improve the implementation I'd suggest you to try increasing the chunksize of your input array, e.g.:
nx = 5120 # things are OK < 2500
arr = da.random.random(size=(nx,nx,nx), chunksize=(800, 800, 800))
darr_bin = arr > 0.8
# The next line will fail
label_image, num_labels = dask_image.ndmeasure.label(darr_bin)
The configuration above works well for me. For generic dask arrays as input sources you could apply arr.rechunk to change the chunksizes of existing arrays.
FWIW there are some ideas of how to improve the implementations in issue: https://github.com/dask/dask-image/issues/199