[FEA] Superpixel Segmentation using GPU
Is your feature request related to a problem? Please describe.
Scikit-image has a separate module which aims to implement various N-dimensional superpixel segmentation methods (e.g. SLIC, Watershed, QuickShift, Felzenswalb). Superpixels have overarching applications in dealing with super resolution images, as well as 3D images such as in the medical domain. Unfortunately, the current algorithms, even if written in Cython, can benefit amazingly from GPU-based acceleration, as most operations can be parallelized.
Describe the solution you'd like Fast, GPU-accelerated implementation for superpixel algorithms in N dimensions.
Describe alternatives you've considered Various libraries try to implement SLIC for example, but are either not fully compatible with the current Python/CUDA ecosystem, or in another language altogether.
Additional context Scikit-Image Segmentation Module:
- https://scikit-image.org/docs/stable/api/skimage.segmentation.html#module-skimage.segmentation
SLIC implementation as an extension to OpenCV in C++
- https://github.com/PSMM/SLIC-Superpixels
NumPy/OpenCV implementation in Python
- https://github.com/jayrambhia/superpixels-SLIC/blob/master/SLICcv.py
CPU implementation of SLIC claiming to achieve near 40x improvement over skimage
- https://github.com/Algy/fast-slic
SLIC implementations using CUDA
- https://github.com/fderue/SLIC_CUDA
- https://github.com/painnick/gSLIC/tree/master/gSLIC
- https://github.com/rosalindfranklininstitute/cuda-slic
Hi @m-krastev
We would like to provide methods under cucim.skimage in a way that matches the scikit-image API (or at least some subset of it). It is okay to implement a function with a subset of kwargs/flags raising NotImplementedError or for dimensions other than 2D or 3D to raise NotImplementedError. I have not looked into options for superpixel methods recently so thank you for providing links to existing implementations.
For the CUDA-based ones, I would avoid looking at gSLIC as a reference as I didn't see any license terms in the repository, but the other two look compatible from a licensing standpoint. Do you happen to have tested either to see how close they currently are to what is in scikit-image? If we find an implementation that is performant and seems useful, but does not closely match scikit-image results, then it may be possible to include such a method under cucim.core instead of the cucim.skimage module.
Thanks for the quick response @grlee77! I agree it's okay to not bother with dimensions beyond 2D and 3D in general. Also, it's good to point out that so far I've only been using the SLIC superpixel implementations.
I have tested the CPU-based fast_slic and cuda-slic (the CuPy flavor). fast_slic seems to work fast enough, but is limited to 2D and may not be easy to translate to CUDA.
I provide a toy benchmark from my experience, the code with 30 iterations is just run in a for-loop.
CPU: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
GPU: NVIDIA A100
Input image: image.shape=216x216x217, n_segments=15_000, compactness=0.01, multichannel=False
scikit-image (1 iter)
time: 3.4363 s
scikit-image (30 iter)
time: 94.4181 s
cuda-slic (1 iter)
time: 9.1657 s
cuda-slic (30 iter)
time: 26.3930 s
**CUDA-SLIC**
+ Function signature mostly compatible with `scikit-image`
+ Implements most of the functionalities needed
- Uses JIT compilation which can render the usage of GPU meaningless for one-time function calls.
- Doesn't implement slic_zero (Adaptive-SLIC) which seems to produce more convex/better superpixels.
Overall, if CUDA-Slic or some improved variant of it addressing the above limitations could be integrated into cucim.skimage, that could present a wonderful improvement.
Took another look at the repos you linked and agree that the cuda-slice one is already based on the scikit-image API. That one already has a CuPy implementation and reuses one of the Cython helper functions from scikit-image, so would be pretty easy to adapt here without too much work.
I also looked briefly at CUDA_SLIC. That one is using older style CUDA Texture/Surface APIs. There was at least one strange thing in an initial look a that code: They have an RGBA->LAB conversion function that computes the LAB elements but then doesn't actually use them and writes the original RGBA values to the surface instead. Adapting that repo to cuCIM would be more refactoring work than the cuda-slic case above, particularly if it was to also be extended to support 3D.
A third option for 2D data that claims to have high performance, but is not compatible with cuCIM's license due to non-commercial restrictions is gSLICr. There are gSLICrPy Python bindings for that which may work for non-profit/research purposes.