cucim icon indicating copy to clipboard operation
cucim copied to clipboard

Adding scikit-image' Blob detection (second try)

Open monzelr opened this issue 2 years ago • 8 comments

Hi again, due to renaming my branch to add-blob-detection, the old PR #411 had been closed.

This PR solves issue #398.

Basically, I followed the suggestions by @grlee77 and here is the difference to PR #411:

  • replaced the draw.disk and draw.ellipsoid GPU functions by the CPU functions (skimage.draw.disk / ellipsoid) in the unit tests, the matrices moved to GPU via cupy.asarray
  • removed the submodule cucim.skimage.draw becaused it is not needed anymore
  • deleted two unit tests because _blob_overlap is implemented in c++/cuda and not in python anymore

Again thanks to @annendominik who helped with the implementation.

And also thank you for your help @grlee77 - it is the first time I do a PR in github. ;-)

monzelr avatar Sep 26 '22 08:09 monzelr

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

GPUtester avatar Sep 26 '22 08:09 GPUtester

ok to test

jakirkham avatar Sep 26 '22 08:09 jakirkham

ok to test

jjacobelli avatar Sep 26 '22 09:09 jjacobelli

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

vbrow29 avatar Oct 05 '22 05:10 vbrow29

Just leaving a comment to say I have been monitoring this PR (interested in using GPU-accelerated blob detection algorithms from skimage). I have taken an attempt at building cucim using the implementation in this PR, and have found it to provide the speedup I am looking for, at least for blob_log in 3D images. However, I thought it may be useful to note that while the spots' XYZ coordinates for my images are consistent for this implementation and the base skimage method, the sigma values disagree. I am not sure if this is expected due to the difference in implementation or if the functions (esp. prune_blobs) still need some tweaking. Looking forward to seeing this merged with cucim!!

@vbrow29 Thank you for this hint - I could not appropriate test the sigma values in 3D because I do not have any 3D data (normally, this is covered in the unit tests which I can't run because I am on windows). The sigma values should match between both implementation. I guess I have an error in the Cuda code which I have to debug.

Is it possible for you to provide any 3D image data so I can have a look at those sigma values?

monzelr avatar Oct 05 '22 08:10 monzelr

@monzelr sure thing. I am not sure what the best method is for sending the files to you, if you reach out to me at [email protected] I can transfer the files there. Or if preferred, I can create a repo / google drive folder containing the 3D test image I was using and the blob_log parameters. Let me know which is best!

vbrow29 avatar Oct 05 '22 19:10 vbrow29

@vbrow29 : just sent a request via mail

monzelr avatar Oct 05 '22 20:10 monzelr

@vbrow29 : I just looked at your provided 3D data and the differences in the sigma values.

The problem is a kind of error propagation, here a summary:

  • cucim.skimage.util.img_as_float returns a np.float32 while skimage.uti.img_as_float returns np.float64 - this is maybe a bug, but it is not the reason for the difference in sigma values (I will do a bug report to clarify this)
  • cupyx.scipy.ndimage.gaussian_laplace behaves different to scipy.ndimage.gaussian_laplace. This is maybe due to the different implementation and the different architecture between CPU / GPU - I did not further investigate this. During debugging, I see different values in the image_cube variable in cucim.skimage.feature.blob.blob_log which results from the gaussian_laplace function
  • cucim.skimage.feature.peak.peak_local_max also behaves also a bit different than skimage.feature.peak.peak_local_max. I guess this is like the gaussian_laplace function a problem with the different architecture and floating precision or maybe also because of slightly different implementation

Overall, I can see that the function blob_log finds the blobs with different sigma values. The blob detection algorithm chooses always the blob with the highest sigma values. The cucim.skimage.feature.blob.blob_log implementation tends to find larger blobs then the skimage implementation.

I know this is a kind of frustrated answer because the cucim-algorithm is not exactly like the skimage-algorithm, even we use equal-named functions...

What you can do to get the right sigma values is to look for blobs with the cucim implementation and then iterate over the blob regions to determine the skimage sigma values.


Also note: the sigma values are generated like this:

sigma_list = cupy.linspace(min_sigma, max_sigma, num_sigma)

So basically we have discrete sigma values and every blob detection algorithm returns sigma values from the sigma_list.

monzelr avatar Oct 06 '22 10:10 monzelr

Thank you @grlee77!

monzelr avatar Nov 17 '22 09:11 monzelr

Made a few performance-related changes to:

  • avoid various intermediate arrays to reduce memory overhead
  • create sigma_list using basic tuple operations to avoid the overhead calling numpy or cupy functions on tiny arrays of length image.ndim

Initial benchmarking looks great. The results below can be generated running the script in the benchmarks/skimage folder. e.g. to benchmark blob_dog for shape (3840, 2160) and float32 dtype, with repeated runs over a 5 second interval:

python cucim_feature_bench.py -f blob_dog -i 3840,2160 -d float32 -t 5

2D results

Seeing 100x, 132x, and 41x speedups for blob_dog, blob_log and blob_doh, respectively on a 4k float32 image.

shape                                                 (3840, 2160)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               100.031775
CPU: host (mean)                                          6.822972
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.068201
GPU: host (std)                                            0.00036
GPU: device (mean)                                        0.068208
GPU: device (std)                                          0.00036
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           2.0
GPU accel                                               132.400334
CPU: host (mean)                                         35.929339
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.271333
GPU: host (std)                                           0.017035
GPU: device (mean)                                        0.271369
GPU: device (std)                                         0.017038
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                                 (3840, 2160)
function_name                                             blob_doh
dtype                                                      float32
ndim                                                           2.0
GPU accel                                                41.072756
CPU: host (mean)                                           5.61557
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.136709
GPU: host (std)                                           0.001834
GPU: device (mean)                                        0.136722
GPU: device (std)                                         0.001838
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

3D results

Seeing >130x speedups for blob_dog and blob_log on a (192, 192, 192) float32 volume

shape                                              (192, 192, 192)
function_name                                             blob_dog
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               133.993833
CPU: host (mean)                                         15.755103
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.117573
GPU: host (std)                                           0.001517
GPU: device (mean)                                        0.117581
GPU: device (std)                                         0.001517
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

shape                                              (192, 192, 192)
function_name                                             blob_log
dtype                                                      float32
ndim                                                           3.0
GPU accel                                               136.752506
CPU: host (mean)                                         59.750352
CPU: host (std)                                                0.0
GPU: host (mean)                                          0.436915
GPU: host (std)                                            0.01022
GPU: device (mean)                                        0.436923
GPU: device (std)                                          0.01022
GPU: DEV Name                                   [NVIDIA RTX A6000]
CPU: DEV Name         [Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz]

grlee77 avatar Nov 17 '22 15:11 grlee77

rerun tests

grlee77 avatar Nov 17 '22 18:11 grlee77

CI failures are unrelated (apparently related to a new version of CMake released today)

grlee77 avatar Nov 17 '22 18:11 grlee77

@gpucibot merge

grlee77 avatar Nov 18 '22 13:11 grlee77