kmcuda icon indicating copy to clipboard operation
kmcuda copied to clipboard

knn_cuda much slower than CPU implementation

Open fferroni opened this issue 7 years ago • 1 comments

Hi,

Was trying to use this for a GPU accelerated knn implementation. For example, finding the closest neighbor to a set of points.

import time
import numpy
from libKMCUDA import knn_cuda

numpy.random.seed(0)
arr = numpy.random.rand(10000, 3).astype(numpy.float32)
ca  = numpy.arange(10000, dtype=numpy.uint32)
timings = []
for i in range(10):
    if i < 5: continue
    start = time.time()
    neighbors1 = knn_cuda(1, arr, arr, ca, metric="L2", verbosity=0, device=0)
    timings.append(time.time() - start)

print("CUDA:", numpy.mean(timings))
print(neighbors1)

from scipy.spatial import cKDTree

timings = []
for i in range(10):
    if i < 5: continue
    start = time.time()
    tree = cKDTree(arr)
    neighbors2 = tree.query(arr, 2)
    timings.append(time.time() - start)

print("Scipy:", numpy.mean(timings))
print(neighbors2)

assert numpy.all(numpy.array(neighbors1[:, 0]) == neighbors2[1][:, 1])

I get something like:

CUDA: 0.26066293716430666
[[9005]
 [5850]
 [3095]
 ...
 [6974]
 [2078]
 [ 603]]
Scipy: 0.013833379745483399
(array([[0.        , 0.03410776],
       [0.        , 0.02017157],
       [0.        , 0.02988571],
       ...,
       [0.        , 0.02020105],
       [0.        , 0.01357323],
       [0.        , 0.0398004 ]]), array([[   0, 9005],
       [   1, 5850],
       [   2, 3095],
       ...,
       [9997, 6974],
       [9998, 2078],
       [9999,  603]]))

So the CUDA version is >20x slower than the CPU Scipy version?

fferroni avatar Sep 19 '18 16:09 fferroni

The underlying algorithms are different. kmcuda uses spatial cluster index and scipy's is KD tree.

  1. You are using knn_cuda wrong: you feed the same points as centroids so the clever triangle inequality trick becomes useless. You need to build the spatial index first - that is, cluster your points to the optimal number of clusters (a hyperparameter). In your particular artificial random case you can feed a uniform grid instead.
  2. For large dimensions (20 is already large) do not expect this to run significantly faster than brute force.

Taken from https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html KMCUDA targets dimensions 100-10000.

vmarkovtsev avatar Sep 20 '18 09:09 vmarkovtsev