cucim [FEA] Implementing Scikit-Image's Blob Detection

Hello to the cucim dev team,

@annendominik and I are currently implementing Scikit's image blob detection and want to add it to this beautiful lib cucim.

Is your feature request related to a problem? Please describe. Implementing blob detection from scikit-image with cupy backend

Describe the solution you'd like A lot of numpy/scipy functions can be one-by-one replaced with cupy functions, however, the scikit-image is using the ahead of time compiler pythran in its background. So the scikit-image code has a lot of for loops which need to be replaced in either cupy arrays or using cuda kernels. An example is the function _blob_overlap which is using the function spatial.cKDTree which is currently not implemented in cupyx.scipy. Also, _blob_overlap uses large for loops which should be avoided. (AFAIK it is not possible to cythonized/pythranized, but numba would be a possibility?)

I would like to write ElementwiseKernel instead of RawKernel if that's ok. Because they are easier to handle. Those kernel are going to be embedded in the *.py file.

Describe alternatives you've considered Alternative to Elementwise Kernel would be RawKernel and using *.cu files like proposed in https://github.com/rapidsai/cucim/issues/375. However, this needs more time and understanding of the GPU.

Additional context If someone would review the code for its fastness I would appreciate it. Currently, we just avoid for-loops as speed improvements.

Aug 31 '22 17:08 monzelr

Hi @monzelr, it would be great if you want to work on this.

Yes, scikit-image 0.19 did start using Pythran, however there is still much more Cython-based code than Pythran overall. Either way, this sort of compiled code is less trivially converted to CuPy-based solutions and typically requires use of ElementwiseKernel or RawModule as you mention.

There are currently a couple uses of scipy.spatial.cKDTree in cuCIM, but this is unfortunately done on the host at the moment due to the lack of a GPU equivalent in CuPy. (It is used in the private ensure_spacing helper used within feature.peak_local_max as well as within feature.corner_peaks).

Sep 06 '22 19:09 grlee77

I don't think Pythran is being used on the specific blob detection functions you asked about, though?

I also don't see any large for loops in _blob_overlap, but think perhaps you mean in _prune_blobs which does have such a loop over points returned by cKDTree.

One other tip: A few of the helper functions have many simple math operations (e.g. _compute_sphere_overlap). In that case, adding a simple @cupy.fuse() decorator to the function can fuse all of these into a single CUDA kernel and should make it several times more efficient than if each individual multiplication, division, etc. was a separate CUDA kernel. Basically fuse can only be used with a subset of functions, but all basic math operations and NumPy ufuncs should be supported.

We don't currently have Numba as a dependency and it is somewhat heavy-weight as it also pulls in LLVM, so l would prefer to try implementing using CuPy kernels.

Sep 06 '22 19:09 grlee77

I would initially try converting all components to the GPU except for _prune_blobs, as the other parts look reasonably straightforward. That should still give a performance benefit over scikit-image even if we have to fall back to the CPU for pruning for now.

Sep 06 '22 20:09 grlee77

Hi @grlee77,

yes, you are right: _prune_blobs is the function which needs a lot of time on our Jetson Xavier NX device with scikit-image version 1.1 (Python 3.6). However, this old code used itertools.combinations and not cKDTree in _prune_blobs. The Elementwise Kernel for _prune_blobs which was derived from scikit-image v1.1.x is working fine and now it is pretty fast on the Jetson. It is a kind of wasted time - but at least we learned more about ElementwiseKernel.

However, we are going to use the latest scikit-image version 1.9 now. 1.9 and 1.1 differs a lot and basically we are rewriting everything. Thank you for your input! I will have a look on cupy's fuse.

For those who are interested; that is the new cupy code _prune_blobs derived from scikit-image v1.1:

import cupy as cp
import itertools
from math import sqrt, hypot, acos
import math

prune_blobs_kernel = cp.ElementwiseKernel(
    "raw T blob, int64 n_array, float32 overlap",
    "raw T blob_out",
    r"""
    
    // *************************************************************************
    // This function is derived from Scikit-Image _blob_overlap (v0.11.x):
    // https://github.com/scikit-image/scikit-image/blob/dc10d457d1c36f18651f72b8a21dbeb55fcd34aa/skimage/feature/blob.py#L73
    // *************************************************************************
    
    float d = 0.0;
    float r1 = 0.0;
    float r2 = 0.0;
    float a = 0.0;
    float b = 0.0;
    float c = 0.0;
    float ratio1 = 0.0;
    float ratio2 = 0.0;
    float value = 0.0;
    const float pi = 3.141592653589793;
    const float root2 = 1.4142135623730951;

    //Debug purpose
    //printf("array at index %7d: %f %f %f\n", i, blob[i * 3 + 0],  blob[i * 3 + 1],  blob[i * 3 + 2]);

    for(int k=0; k<(int)(n_array/3); k++)
    {
        // blob[i] --> blob1
        // blob[k] --> blob2
        
        value = 0.0;

        if (i >= k)
        {
            // skip calculation, it is done already
            continue;
        }

        // extent of the blob is given by sqrt(2)*scale
        r1 = blob[3*i + 2] * root2;
        r2 = blob[3*k + 2] * root2;
        d = hypot(blob[3*i + 0] - blob[3*k + 0], blob[3*i + 1] - blob[3*k + 1]);
        
        if (d > r1 + r2)
        {
            value = 0.0;
        }
        else if (d <= abs(r1 - r2))
        {
            // one blob is inside the other, the smaller blob must die
            value = 1.0;
        }
        else
        {
            ratio1 = (d*d + r1*r1 - r2*r2) / (2 * d * r1);
            if (ratio1 > 1)
            {
                ratio1 = 1;
            }
            else if (ratio1 < -1)
            {
                ratio1 = -1;
            }

            ratio2 = (d*d + r2*r2 - r1*r1) / (2 * d * r2);
            if (ratio2 > 1)
            {
                ratio2 = 1;
            }
            else if (ratio2 < -1)
            {
                ratio2 = -1;
            }

            a = -d + r2 + r1;
            b = d - r2 + r1;
            c = d + r2 - r1;
            d = d + r2 + r1;

            float r = min(r1, r2);
            value = (r1*r1 * acos(ratio1) * r2*r2 * acos(ratio2) - 0.5 * sqrt(abs(a * b * c * d))) / (pi * r*r);
            
        }
        
        if (value > overlap)
        {                
            if (blob[3*i + 2] <= blob[k*3 + 2])
            {
                blob_out[3*i + 2] = -1.0;
            }
            else
            {
                blob_out[3*k + 2] = -1.0;
            }
        }
    }
    """,
    "prune_blobs_kernel")


def prune_blobs_cu(blobs_array: cp.ndarray, overlap: float) -> cp.ndarray:
    """Eliminated blobs with area overlap.

    Parameters
    ----------
    blobs_array : ndarray
        A 2d array with each row representing 3 (or 4) values,
        ``(row, col, sigma)`` or ``(pln, row, col, sigma)`` in 3D,
        where ``(row, col)`` (``(pln, row, col)``) are coordinates of the blob
        and ``sigma`` is the standard deviation of the Gaussian kernel which
        detected the blob.
        This array must not have a dimension of size 0.
    overlap : float
        A value between 0 and 1. If the fraction of area overlapping for 2
        blobs is greater than `overlap` the smaller blob is eliminated.
    Returns
    -------
    A : ndarray
        `array` with overlapping blobs removed.
    """

    prune_blobs_kernel(blobs_array, blobs_array.size, overlap, blobs_array, size=blobs_array.shape[0])
    _return = blobs_array[blobs_array[:, 2] > 0, :]
    return _return


def prune_blobs_cu_old(blobs_array, overlap):
    """Eliminated blobs with area overlap"""

    # more iterations might eliminate more blobs, but one suffices
    # for most cases
    for blob1, blob2 in itertools.combinations(blobs_array, 2):
        value = _prune_blobs_blob_overlap_old(blob1, blob2)
        if value > overlap:
            if blob1[2] > blob2[2]:
                blob2[2] = -1
            else:
                blob1[2] = -1

    # return blobs_array[blobs_array[:,2] > 0]
    return cp.array([b for b in blobs_array if b[2] > 0])


def _prune_blobs_blob_overlap_old(blob1, blob2):
    """finds the overlapping area fraction between two blobs"""

    root2 = sqrt(2)

    # extent of the blob is given by sqrt(2)*scale
    r1 = blob1[2] * root2
    r2 = blob2[2] * root2

    d = hypot(blob1[0] - blob2[0], blob1[1] - blob2[1])
    # print("d=%1.1f" % d)
    if d > r1 + r2:
        return 0

    # one blob is inside the other, the smaller blob must die
    if d <= abs(r1 - r2):
        return 1

    ratio1 = (d ** 2 + r1 ** 2 - r2 ** 2) / (2 * d * r1)
    ratio1 = cp.clip(ratio1, -1, 1)
    acos1 = cp.arccos(ratio1)

    ratio2 = (d ** 2 + r2 ** 2 - r1 ** 2) / (2 * d * r2)
    ratio2 = cp.clip(ratio2, -1, 1)
    acos2 = cp.arccos(ratio2)

    a = -d + r2 + r1
    b = d - r2 + r1
    c = d + r2 - r1
    d = d + r2 + r1
    area = r1 ** 2 * acos1 * r2 ** 2 * acos2 - 0.5 * sqrt(abs(a * b * c * d))

    return area / (math.pi * (min(r1, r2) ** 2))


if __name__ == "__main__":
    # just a small example how to use

    array = cp.array([[401., 123., 10.666667],
                      [402., 126., 1.],
                      [416., 123., 7.4444447],
                      [499., 386., 4.2222223],
                      [166., 254., 4.2222223],
                      [218., 146., 4.2222223]], dtype=cp.float32)

    print("\n")
    old = prune_blobs_cu_old(array.copy(), 0.5)
    new = prune_blobs_cu(array.copy(), 0.5)

    print("input:\n", array)
    print("old:\n", old)
    print("new:\n", new)
    print("True? -> ", cp.array_equal(old, new))

Sep 07 '22 07:09 monzelr

Oh, great, I didn't realize you already had something working. Please make a PR when you are ready and we can help review.

scikit-image version 1.1 (Python 3.6).

1.1 sounds like it may be a scikit-learn version? The latest is 0.19.3 for scikit-image :)

Sep 07 '22 13:09 grlee77

Yes, I meant scikit-image v0.11 x)

Sep 07 '22 13:09 monzelr

closed by #413

Nov 30 '22 02:11 grlee77

cucim cucim copied to clipboard

[FEA] Implementing Scikit-Image's Blob Detection

cucim
cucim copied to clipboard