kmcuda
kmcuda copied to clipboard
knn_cuda much slower than CPU implementation
Hi,
Was trying to use this for a GPU accelerated knn implementation. For example, finding the closest neighbor to a set of points.
import time
import numpy
from libKMCUDA import knn_cuda
numpy.random.seed(0)
arr = numpy.random.rand(10000, 3).astype(numpy.float32)
ca = numpy.arange(10000, dtype=numpy.uint32)
timings = []
for i in range(10):
if i < 5: continue
start = time.time()
neighbors1 = knn_cuda(1, arr, arr, ca, metric="L2", verbosity=0, device=0)
timings.append(time.time() - start)
print("CUDA:", numpy.mean(timings))
print(neighbors1)
from scipy.spatial import cKDTree
timings = []
for i in range(10):
if i < 5: continue
start = time.time()
tree = cKDTree(arr)
neighbors2 = tree.query(arr, 2)
timings.append(time.time() - start)
print("Scipy:", numpy.mean(timings))
print(neighbors2)
assert numpy.all(numpy.array(neighbors1[:, 0]) == neighbors2[1][:, 1])
I get something like:
CUDA: 0.26066293716430666
[[9005]
[5850]
[3095]
...
[6974]
[2078]
[ 603]]
Scipy: 0.013833379745483399
(array([[0. , 0.03410776],
[0. , 0.02017157],
[0. , 0.02988571],
...,
[0. , 0.02020105],
[0. , 0.01357323],
[0. , 0.0398004 ]]), array([[ 0, 9005],
[ 1, 5850],
[ 2, 3095],
...,
[9997, 6974],
[9998, 2078],
[9999, 603]]))
So the CUDA version is >20x slower than the CPU Scipy version?
The underlying algorithms are different. kmcuda uses spatial cluster index and scipy's is KD tree.
- You are using
knn_cudawrong: you feed the same points as centroids so the clever triangle inequality trick becomes useless. You need to build the spatial index first - that is, cluster your points to the optimal number of clusters (a hyperparameter). In your particular artificial random case you can feed a uniform grid instead. -
For large dimensions (20 is already large) do not expect this to run significantly faster than brute force.
Taken from https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html KMCUDA targets dimensions 100-10000.