cucim
cucim copied to clipboard
Adding scikit-image' Blob detection (second try)
Hi again,
due to renaming my branch to add-blob-detection
, the old PR #411 had been closed.
This PR solves issue #398.
Basically, I followed the suggestions by @grlee77 and here is the difference to PR #411:
- replaced the draw.disk and draw.ellipsoid GPU functions by the CPU functions (skimage.draw.disk / ellipsoid) in the unit tests, the matrices moved to GPU via
cupy.asarray
- removed the submodule cucim.skimage.draw becaused it is not needed anymore
- deleted two unit tests because
_blob_overlap
is implemented in c++/cuda and not in python anymore
Again thanks to @annendominik who helped with the implementation.
And also thank you for your help @grlee77 - it is the first time I do a PR in github. ;-)
Can one of the admins verify this patch?
Admins can comment ok to test
to allow this one PR to run or add to allowlist
to allow all future PRs from the same author to run.
ok to test
ok to test
Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!
Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!
@vbrow29 Thank you for this hint - I could not appropriate test the sigma values in 3D because I do not have any 3D data (normally, this is covered in the unit tests which I can't run because I am on windows). The sigma values should match between both implementation. I guess I have an error in the Cuda code which I have to debug.
Is it possible for you to provide any 3D image data so I can have a look at those sigma values?
@monzelr sure thing. I am not sure what the best method is for sending the files to you, if you reach out to me at [email protected] I can transfer the files there. Or if preferred, I can create a repo / google drive folder containing the 3D test image I was using and the blob_log parameters. Let me know which is best!
@vbrow29 : just sent a request via mail
@vbrow29 : I just looked at your provided 3D data and the differences in the sigma values.
The problem is a kind of error propagation, here a summary:
-
cucim.skimage.util.img_as_float
returns anp.float32
whileskimage.uti.img_as_float
returnsnp.float64
- this is maybe a bug, but it is not the reason for the difference in sigma values (I will do a bug report to clarify this) -
cupyx.scipy.ndimage.gaussian_laplace
behaves different toscipy.ndimage.gaussian_laplace
. This is maybe due to the different implementation and the different architecture between CPU / GPU - I did not further investigate this. During debugging, I see different values in theimage_cube
variable incucim.skimage.feature.blob.blob_log
which results from thegaussian_laplace
function -
cucim.skimage.feature.peak.peak_local_max
also behaves also a bit different thanskimage.feature.peak.peak_local_max
. I guess this is like thegaussian_laplace
function a problem with the different architecture and floating precision or maybe also because of slightly different implementation
Overall, I can see that the function blob_log
finds the blobs with different sigma values. The blob detection algorithm chooses always the blob with the highest sigma values. The cucim.skimage.feature.blob.blob_log
implementation tends to find larger blobs then the skimage
implementation.
I know this is a kind of frustrated answer because the cucim-algorithm is not exactly like the skimage-algorithm, even we use equal-named functions...
What you can do to get the right sigma values is to look for blobs with the cucim
implementation and then iterate over the blob regions to determine the skimage
sigma values.
Also note: the sigma values are generated like this:
sigma_list = cupy.linspace(min_sigma, max_sigma, num_sigma)
So basically we have discrete sigma values and every blob detection algorithm returns sigma values from the sigma_list
.
Thank you @grlee77!
Made a few performance-related changes to:
- avoid various intermediate arrays to reduce memory overhead
- create
sigma_list
using basic tuple operations to avoid the overhead calling numpy or cupy functions on tiny arrays of length image.ndim
Initial benchmarking looks great. The results below can be generated running the script in the benchmarks/skimage
folder. e.g. to benchmark blob_dog
for shape (3840, 2160) and float32 dtype, with repeated runs over a 5 second interval:
python cucim_feature_bench.py -f blob_dog -i 3840,2160 -d float32 -t 5
2D results
Seeing 100x, 132x, and 41x speedups for blob_dog
, blob_log
and blob_doh
, respectively on a 4k float32 image.
shape (3840, 2160)
function_name blob_dog
dtype float32
ndim 2.0
GPU accel 100.031775
CPU: host (mean) 6.822972
CPU: host (std) 0.0
GPU: host (mean) 0.068201
GPU: host (std) 0.00036
GPU: device (mean) 0.068208
GPU: device (std) 0.00036
GPU: DEV Name [NVIDIA RTX A6000]
CPU: DEV Name [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]
shape (3840, 2160)
function_name blob_log
dtype float32
ndim 2.0
GPU accel 132.400334
CPU: host (mean) 35.929339
CPU: host (std) 0.0
GPU: host (mean) 0.271333
GPU: host (std) 0.017035
GPU: device (mean) 0.271369
GPU: device (std) 0.017038
GPU: DEV Name [NVIDIA RTX A6000]
CPU: DEV Name [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]
shape (3840, 2160)
function_name blob_doh
dtype float32
ndim 2.0
GPU accel 41.072756
CPU: host (mean) 5.61557
CPU: host (std) 0.0
GPU: host (mean) 0.136709
GPU: host (std) 0.001834
GPU: device (mean) 0.136722
GPU: device (std) 0.001838
GPU: DEV Name [NVIDIA RTX A6000]
CPU: DEV Name [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]
3D results
Seeing >130x speedups for blob_dog
and blob_log
on a (192, 192, 192) float32 volume
shape (192, 192, 192)
function_name blob_dog
dtype float32
ndim 3.0
GPU accel 133.993833
CPU: host (mean) 15.755103
CPU: host (std) 0.0
GPU: host (mean) 0.117573
GPU: host (std) 0.001517
GPU: device (mean) 0.117581
GPU: device (std) 0.001517
GPU: DEV Name [NVIDIA RTX A6000]
CPU: DEV Name [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]
shape (192, 192, 192)
function_name blob_log
dtype float32
ndim 3.0
GPU accel 136.752506
CPU: host (mean) 59.750352
CPU: host (std) 0.0
GPU: host (mean) 0.436915
GPU: host (std) 0.01022
GPU: device (mean) 0.436923
GPU: device (std) 0.01022
GPU: DEV Name [NVIDIA RTX A6000]
CPU: DEV Name [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]
rerun tests
CI failures are unrelated (apparently related to a new version of CMake released today)
@gpucibot merge