cucim
cucim copied to clipboard
Improve performance of Euclidean distance transform
This PR gives a further 2-4x speedup to the 2D and 3D distance transform functions by creating dedicated elementwise CUDA kernels for the pre- and post-processing steps. These include, for example, packing and upacking the 3D coordinates into a single integer. This resolves various "TODO" comments that were in the source, but I don't think we have an issue open for this.
Aside from the performance benefit there is also a substantial reduction in memory usage for the default case of return_distance=True, return_indices=False
as in this case the explicit arrays of indices never need to be created.
Benchmarks
Implementation from release 22.08.01
shape | return dist. | return ind. | SciPy | ITK | cuCIM | accel. (CuPy vs SciPy) | accel. (CuPy vs ITK) |
---|---|---|---|---|---|---|---|
256, 256 | False | True | 0.00379 | ---- | 0.00081 | 4.66 | ---- |
256, 256 | True | False | 0.00428 | 0.00383 | 0.00102 | 4.21 | 3.76 |
256, 256 | True | True | 0.00429 | ---- | 0.00107 | 4.02 | ---- |
512, 512 | False | True | 0.01724 | ---- | 0.00091 | 18.97 | ---- |
512, 512 | True | False | 0.01866 | 0.00788 | 0.00104 | 17.94 | 7.58 |
512, 512 | True | True | 0.01876 | ---- | 0.00107 | 17.48 | ---- |
2048, 2048 | False | True | 0.35053 | ---- | 0.00256 | 137.03 | ---- |
2048, 2048 | True | False | 0.39468 | 0.08220 | 0.00288 | 137.02 | 28.54 |
2048, 2048 | True | True | 0.39356 | ---- | 0.00296 | 133.04 | ---- |
4096, 4096 | False | True | 1.57430 | ---- | 0.00736 | 213.85 | ---- |
4096, 4096 | True | False | 1.79513 | 0.54968 | 0.00858 | 209.34 | 64.10 |
4096, 4096 | True | True | 1.81570 | ---- | 0.00889 | 204.25 | ---- |
32, 32, 32 | False | True | 0.00319 | ---- | 0.00069 | 4.62 | ---- |
32, 32, 32 | True | False | 0.00356 | 0.00481 | 0.00096 | 3.71 | 5.01 |
32, 32, 32 | True | True | 0.00360 | ---- | 0.00103 | 3.49 | ---- |
128, 128, 128 | False | True | 0.27901 | ---- | 0.00131 | 213.20 | ---- |
128, 128, 128 | True | False | 0.31013 | 0.04504 | 0.00156 | 199.32 | 28.94 |
128, 128, 128 | True | True | 0.31289 | ---- | 0.00163 | 192.07 | ---- |
256, 256, 256 | False | True | 3.02435 | ---- | 0.00696 | 434.25 | ---- |
256, 256, 256 | True | False | 3.34077 | 0.33528 | 0.00855 | 390.57 | 39.20 |
256, 256, 256 | True | True | 3.32920 | ---- | 0.00915 | 363.89 | ---- |
384, 384, 384 | False | True | 10.68973 | ---- | 0.02284 | 468.07 | ---- |
384, 384, 384 | True | False | 11.73340 | 1.16172 | 0.02809 | 417.65 | 41.35 |
384, 384, 384 | True | True | 11.68002 | ---- | 0.03007 | 388.40 | ---- |
Proposed Implementation
shape | return dist. | return ind. | SciPy | ITK | cuCIM | accel. (CuPy vs SciPy) | accel. (CuPy vs ITK) |
---|---|---|---|---|---|---|---|
256, 256 | False | True | 0.00385 | N/A | 0.00021 | 18.49 | N/A |
256, 256 | True | False | 0.00433 | 0.00370 | 0.00021 | 20.84 | 17.83 |
256, 256 | True | True | 0.00436 | N/A | 0.00022 | 19.55 | N/A |
512, 512 | False | True | 0.01724 | N/A | 0.00030 | 58.44 | N/A |
512, 512 | True | False | 0.01893 | 0.00790 | 0.00029 | 66.42 | 27.73 |
512, 512 | True | True | 0.01867 | N/A | 0.00030 | 62.65 | N/A |
2048, 2048 | False | True | 0.33893 | N/A | 0.00154 | 220.25 | N/A |
2048, 2048 | True | False | 0.40912 | 0.08219 | 0.00154 | 266.11 | 53.46 |
2048, 2048 | True | True | 0.39056 | N/A | 0.00162 | 241.76 | N/A |
4096, 4096 | False | True | 1.58069 | N/A | 0.00418 | 377.92 | N/A |
4096, 4096 | True | False | 1.81860 | 0.55152 | 0.00418 | 435.36 | 132.03 |
4096, 4096 | True | True | 1.80671 | N/A | 0.00448 | 402.85 | N/A |
32, 32, 32 | False | True | 0.00319 | N/A | 0.00017 | 19.05 | N/A |
32, 32, 32 | True | False | 0.00359 | 0.00514 | 0.00011 | 33.25 | 47.68 |
32, 32, 32 | True | True | 0.00355 | N/A | 0.00019 | 18.37 | N/A |
128, 128, 128 | False | True | 0.27217 | N/A | 0.00059 | 458.70 | N/A |
128, 128, 128 | True | False | 0.31409 | 0.03384 | 0.00050 | 633.12 | 68.21 |
128, 128, 128 | True | True | 0.31207 | N/A | 0.00066 | 474.26 | N/A |
256, 256, 256 | False | True | 3.01718 | N/A | 0.00304 | 993.26 | N/A |
256, 256, 256 | True | False | 3.32660 | 0.28979 | 0.00235 | 1418.07 | 123.53 |
256, 256, 256 | True | True | 3.31874 | N/A | 0.00352 | 942.49 | N/A |
384, 384, 384 | False | True | 10.67328 | N/A | 0.01000 | 1067.34 | N/A |
384, 384, 384 | True | False | 11.69219 | 1.11274 | 0.00771 | 1517.46 | 144.42 |
384, 384, 384 | True | True | 11.76073 | N/A | 0.01161 | 1013.03 | N/A |