cucim
cucim copied to clipboard
[FEA] Implementing Scikit-Image's Blob Detection
Hello to the cucim dev team,
@annendominik and I are currently implementing Scikit's image blob detection and want to add it to this beautiful lib cucim
.
Is your feature request related to a problem? Please describe. Implementing blob detection from scikit-image with cupy backend
Describe the solution you'd like
A lot of numpy/scipy functions can be one-by-one replaced with cupy functions, however, the scikit-image is using the ahead of time compiler pythran in its background. So the scikit-image
code has a lot of for loops which need to be replaced in either cupy arrays or using cuda kernels. An example is the function _blob_overlap
which is using the function spatial.cKDTree
which is currently not implemented in cupyx.scipy
. Also, _blob_overlap
uses large for loops which should be avoided. (AFAIK it is not possible to cythonized/pythranized, but numba
would be a possibility?)
I would like to write ElementwiseKernel
instead of RawKernel
if that's ok. Because they are easier to handle. Those kernel are going to be embedded in the *.py file.
Describe alternatives you've considered Alternative to Elementwise Kernel would be RawKernel and using *.cu files like proposed in https://github.com/rapidsai/cucim/issues/375. However, this needs more time and understanding of the GPU.
Additional context If someone would review the code for its fastness I would appreciate it. Currently, we just avoid for-loops as speed improvements.
Hi @monzelr, it would be great if you want to work on this.
Yes, scikit-image 0.19 did start using Pythran, however there is still much more Cython-based code than Pythran overall. Either way, this sort of compiled code is less trivially converted to CuPy-based solutions and typically requires use of ElementwiseKernel
or RawModule
as you mention.
There are currently a couple uses of scipy.spatial.cKDTree
in cuCIM, but this is unfortunately done on the host at the moment due to the lack of a GPU equivalent in CuPy. (It is used in the private ensure_spacing
helper used within feature.peak_local_max
as well as within feature.corner_peaks
).
I don't think Pythran is being used on the specific blob detection functions you asked about, though?
I also don't see any large for loops in _blob_overlap
, but think perhaps you mean in _prune_blobs
which does have such a loop over points returned by cKDTree
.
One other tip: A few of the helper functions have many simple math operations (e.g. _compute_sphere_overlap
). In that case, adding a simple @cupy.fuse()
decorator to the function can fuse all of these into a single CUDA kernel and should make it several times more efficient than if each individual multiplication, division, etc. was a separate CUDA kernel. Basically fuse
can only be used with a subset of functions, but all basic math operations and NumPy ufuncs should be supported.
We don't currently have Numba as a dependency and it is somewhat heavy-weight as it also pulls in LLVM, so l would prefer to try implementing using CuPy kernels.
I would initially try converting all components to the GPU except for _prune_blobs
, as the other parts look reasonably straightforward. That should still give a performance benefit over scikit-image even if we have to fall back to the CPU for pruning for now.
Hi @grlee77,
yes, you are right: _prune_blobs
is the function which needs a lot of time on our Jetson Xavier NX device with scikit-image
version 1.1 (Python 3.6). However, this old code used itertools.combinations
and not cKDTree
in _prune_blobs
. The Elementwise Kernel for _prune_blobs
which was derived from scikit-image
v1.1.x is working fine and now it is pretty fast on the Jetson. It is a kind of wasted time - but at least we learned more about ElementwiseKernel
.
However, we are going to use the latest scikit-image version 1.9 now. 1.9 and 1.1 differs a lot and basically we are rewriting everything.
Thank you for your input! I will have a look on cupy's fuse
.
For those who are interested; that is the new cupy code _prune_blobs
derived from scikit-image
v1.1:
import cupy as cp
import itertools
from math import sqrt, hypot, acos
import math
prune_blobs_kernel = cp.ElementwiseKernel(
"raw T blob, int64 n_array, float32 overlap",
"raw T blob_out",
r"""
// *************************************************************************
// This function is derived from Scikit-Image _blob_overlap (v0.11.x):
// https://github.com/scikit-image/scikit-image/blob/dc10d457d1c36f18651f72b8a21dbeb55fcd34aa/skimage/feature/blob.py#L73
// *************************************************************************
float d = 0.0;
float r1 = 0.0;
float r2 = 0.0;
float a = 0.0;
float b = 0.0;
float c = 0.0;
float ratio1 = 0.0;
float ratio2 = 0.0;
float value = 0.0;
const float pi = 3.141592653589793;
const float root2 = 1.4142135623730951;
//Debug purpose
//printf("array at index %7d: %f %f %f\n", i, blob[i * 3 + 0], blob[i * 3 + 1], blob[i * 3 + 2]);
for(int k=0; k<(int)(n_array/3); k++)
{
// blob[i] --> blob1
// blob[k] --> blob2
value = 0.0;
if (i >= k)
{
// skip calculation, it is done already
continue;
}
// extent of the blob is given by sqrt(2)*scale
r1 = blob[3*i + 2] * root2;
r2 = blob[3*k + 2] * root2;
d = hypot(blob[3*i + 0] - blob[3*k + 0], blob[3*i + 1] - blob[3*k + 1]);
if (d > r1 + r2)
{
value = 0.0;
}
else if (d <= abs(r1 - r2))
{
// one blob is inside the other, the smaller blob must die
value = 1.0;
}
else
{
ratio1 = (d*d + r1*r1 - r2*r2) / (2 * d * r1);
if (ratio1 > 1)
{
ratio1 = 1;
}
else if (ratio1 < -1)
{
ratio1 = -1;
}
ratio2 = (d*d + r2*r2 - r1*r1) / (2 * d * r2);
if (ratio2 > 1)
{
ratio2 = 1;
}
else if (ratio2 < -1)
{
ratio2 = -1;
}
a = -d + r2 + r1;
b = d - r2 + r1;
c = d + r2 - r1;
d = d + r2 + r1;
float r = min(r1, r2);
value = (r1*r1 * acos(ratio1) * r2*r2 * acos(ratio2) - 0.5 * sqrt(abs(a * b * c * d))) / (pi * r*r);
}
if (value > overlap)
{
if (blob[3*i + 2] <= blob[k*3 + 2])
{
blob_out[3*i + 2] = -1.0;
}
else
{
blob_out[3*k + 2] = -1.0;
}
}
}
""",
"prune_blobs_kernel")
def prune_blobs_cu(blobs_array: cp.ndarray, overlap: float) -> cp.ndarray:
"""Eliminated blobs with area overlap.
Parameters
----------
blobs_array : ndarray
A 2d array with each row representing 3 (or 4) values,
``(row, col, sigma)`` or ``(pln, row, col, sigma)`` in 3D,
where ``(row, col)`` (``(pln, row, col)``) are coordinates of the blob
and ``sigma`` is the standard deviation of the Gaussian kernel which
detected the blob.
This array must not have a dimension of size 0.
overlap : float
A value between 0 and 1. If the fraction of area overlapping for 2
blobs is greater than `overlap` the smaller blob is eliminated.
Returns
-------
A : ndarray
`array` with overlapping blobs removed.
"""
prune_blobs_kernel(blobs_array, blobs_array.size, overlap, blobs_array, size=blobs_array.shape[0])
_return = blobs_array[blobs_array[:, 2] > 0, :]
return _return
def prune_blobs_cu_old(blobs_array, overlap):
"""Eliminated blobs with area overlap"""
# more iterations might eliminate more blobs, but one suffices
# for most cases
for blob1, blob2 in itertools.combinations(blobs_array, 2):
value = _prune_blobs_blob_overlap_old(blob1, blob2)
if value > overlap:
if blob1[2] > blob2[2]:
blob2[2] = -1
else:
blob1[2] = -1
# return blobs_array[blobs_array[:,2] > 0]
return cp.array([b for b in blobs_array if b[2] > 0])
def _prune_blobs_blob_overlap_old(blob1, blob2):
"""finds the overlapping area fraction between two blobs"""
root2 = sqrt(2)
# extent of the blob is given by sqrt(2)*scale
r1 = blob1[2] * root2
r2 = blob2[2] * root2
d = hypot(blob1[0] - blob2[0], blob1[1] - blob2[1])
# print("d=%1.1f" % d)
if d > r1 + r2:
return 0
# one blob is inside the other, the smaller blob must die
if d <= abs(r1 - r2):
return 1
ratio1 = (d ** 2 + r1 ** 2 - r2 ** 2) / (2 * d * r1)
ratio1 = cp.clip(ratio1, -1, 1)
acos1 = cp.arccos(ratio1)
ratio2 = (d ** 2 + r2 ** 2 - r1 ** 2) / (2 * d * r2)
ratio2 = cp.clip(ratio2, -1, 1)
acos2 = cp.arccos(ratio2)
a = -d + r2 + r1
b = d - r2 + r1
c = d + r2 - r1
d = d + r2 + r1
area = r1 ** 2 * acos1 * r2 ** 2 * acos2 - 0.5 * sqrt(abs(a * b * c * d))
return area / (math.pi * (min(r1, r2) ** 2))
if __name__ == "__main__":
# just a small example how to use
array = cp.array([[401., 123., 10.666667],
[402., 126., 1.],
[416., 123., 7.4444447],
[499., 386., 4.2222223],
[166., 254., 4.2222223],
[218., 146., 4.2222223]], dtype=cp.float32)
print("\n")
old = prune_blobs_cu_old(array.copy(), 0.5)
new = prune_blobs_cu(array.copy(), 0.5)
print("input:\n", array)
print("old:\n", old)
print("new:\n", new)
print("True? -> ", cp.array_equal(old, new))
Oh, great, I didn't realize you already had something working. Please make a PR when you are ready and we can help review.
scikit-image version 1.1 (Python 3.6).
1.1 sounds like it may be a scikit-learn version? The latest is 0.19.3 for scikit-image :)
Yes, I meant scikit-image
v0.11 x)
closed by #413