cucim icon indicating copy to clipboard operation
cucim copied to clipboard

Improve performance of Euclidean distance transform

Open grlee77 opened this issue 2 years ago • 0 comments

This PR gives a further 2-4x speedup to the 2D and 3D distance transform functions by creating dedicated elementwise CUDA kernels for the pre- and post-processing steps. These include, for example, packing and upacking the 3D coordinates into a single integer. This resolves various "TODO" comments that were in the source, but I don't think we have an issue open for this.

Aside from the performance benefit there is also a substantial reduction in memory usage for the default case of return_distance=True, return_indices=False as in this case the explicit arrays of indices never need to be created.

Benchmarks

Implementation from release 22.08.01

shape return dist. return ind. SciPy ITK cuCIM accel. (CuPy vs SciPy) accel. (CuPy vs ITK)
256, 256 False True 0.00379 ---- 0.00081 4.66 ----
256, 256 True False 0.00428 0.00383 0.00102 4.21 3.76
256, 256 True True 0.00429 ---- 0.00107 4.02 ----
512, 512 False True 0.01724 ---- 0.00091 18.97 ----
512, 512 True False 0.01866 0.00788 0.00104 17.94 7.58
512, 512 True True 0.01876 ---- 0.00107 17.48 ----
2048, 2048 False True 0.35053 ---- 0.00256 137.03 ----
2048, 2048 True False 0.39468 0.08220 0.00288 137.02 28.54
2048, 2048 True True 0.39356 ---- 0.00296 133.04 ----
4096, 4096 False True 1.57430 ---- 0.00736 213.85 ----
4096, 4096 True False 1.79513 0.54968 0.00858 209.34 64.10
4096, 4096 True True 1.81570 ---- 0.00889 204.25 ----
32, 32, 32 False True 0.00319 ---- 0.00069 4.62 ----
32, 32, 32 True False 0.00356 0.00481 0.00096 3.71 5.01
32, 32, 32 True True 0.00360 ---- 0.00103 3.49 ----
128, 128, 128 False True 0.27901 ---- 0.00131 213.20 ----
128, 128, 128 True False 0.31013 0.04504 0.00156 199.32 28.94
128, 128, 128 True True 0.31289 ---- 0.00163 192.07 ----
256, 256, 256 False True 3.02435 ---- 0.00696 434.25 ----
256, 256, 256 True False 3.34077 0.33528 0.00855 390.57 39.20
256, 256, 256 True True 3.32920 ---- 0.00915 363.89 ----
384, 384, 384 False True 10.68973 ---- 0.02284 468.07 ----
384, 384, 384 True False 11.73340 1.16172 0.02809 417.65 41.35
384, 384, 384 True True 11.68002 ---- 0.03007 388.40 ----

Proposed Implementation

shape return dist. return ind. SciPy ITK cuCIM accel. (CuPy vs SciPy) accel. (CuPy vs ITK)
256, 256 False True 0.00385 N/A 0.00021 18.49 N/A
256, 256 True False 0.00433 0.00370 0.00021 20.84 17.83
256, 256 True True 0.00436 N/A 0.00022 19.55 N/A
512, 512 False True 0.01724 N/A 0.00030 58.44 N/A
512, 512 True False 0.01893 0.00790 0.00029 66.42 27.73
512, 512 True True 0.01867 N/A 0.00030 62.65 N/A
2048, 2048 False True 0.33893 N/A 0.00154 220.25 N/A
2048, 2048 True False 0.40912 0.08219 0.00154 266.11 53.46
2048, 2048 True True 0.39056 N/A 0.00162 241.76 N/A
4096, 4096 False True 1.58069 N/A 0.00418 377.92 N/A
4096, 4096 True False 1.81860 0.55152 0.00418 435.36 132.03
4096, 4096 True True 1.80671 N/A 0.00448 402.85 N/A
32, 32, 32 False True 0.00319 N/A 0.00017 19.05 N/A
32, 32, 32 True False 0.00359 0.00514 0.00011 33.25 47.68
32, 32, 32 True True 0.00355 N/A 0.00019 18.37 N/A
128, 128, 128 False True 0.27217 N/A 0.00059 458.70 N/A
128, 128, 128 True False 0.31409 0.03384 0.00050 633.12 68.21
128, 128, 128 True True 0.31207 N/A 0.00066 474.26 N/A
256, 256, 256 False True 3.01718 N/A 0.00304 993.26 N/A
256, 256, 256 True False 3.32660 0.28979 0.00235 1418.07 123.53
256, 256, 256 True True 3.31874 N/A 0.00352 942.49 N/A
384, 384, 384 False True 10.67328 N/A 0.01000 1067.34 N/A
384, 384, 384 True False 11.69219 1.11274 0.00771 1517.46 144.42
384, 384, 384 True True 11.76073 N/A 0.01161 1013.03 N/A

grlee77 avatar Sep 06 '22 15:09 grlee77