sparse_dot_topn icon indicating copy to clipboard operation
sparse_dot_topn copied to clipboard

Possibility for GPU support?

Open rohanrajpal opened this issue 4 years ago • 4 comments

Thanks a lot for sharing this library.

I was wondering if we can have GPU support? Not sure how tough it would be, but I'll be glad to help! If someone knows some resources on how to go ahead on this, please do share.

rohanrajpal avatar Jul 22 '20 16:07 rohanrajpal

@rohanrajpal thanks for the message.

Years ago when we build this package, we investigated the GPU solution. We didn't find a cuda sparse matrix multiple sparse matrix solution at that time.

Maybe things have changed. Any suggestions/ideas are welcome!

ymwdalex avatar Jul 24 '20 14:07 ymwdalex

@rohanrajpal RAPIDS cuML has NearestNeighbors implementation. It currently supports euclidean distance for dense matrices on GPU but they are working on cosine similarity on sparse matrices for future releases (probably 0.16 or 0.17). https://docs.rapids.ai/api/cuml/stable/api.html?highlight=neighbors#cuml.neighbors.NearestNeighbors

As a temporary solution, you can use cupy sparse matrices and dot product if you don't have any memory limitation.

Considering availability of these tools, it may be out of scope for this package.

aerdem4 avatar Jul 24 '20 14:07 aerdem4

@rohanrajpal RAPIDS cuML has NearestNeighbors implementation. It currently supports euclidean distance for dense matrices on GPU but they are working on cosine similarity on sparse matrices for future releases (probably 0.16 or 0.17). https://docs.rapids.ai/api/cuml/stable/api.html?highlight=neighbors#cuml.neighbors.NearestNeighbors

As a temporary solution, you can use cupy sparse matrices and dot product if you don't have any memory limitation.

Considering availability of these tools, it may be out of scope for this package.

Thanks for sharing. I'll have a look into it.

rohanrajpal avatar Jul 24 '20 15:07 rohanrajpal

I would be very skeptical about the benefit from using GPU for sparse matrix multiplication. Have look at what various sources on the Internet say about bad cache locality and bad suitability for long vector operations which are crucial for the speedups provided by GPU acceleration. Plus the overhead when copying the memory from RAM to GPU RAM and the results back. Especially with large matrices which don't fit into GPU RAM at once, I would not expect any speedup. Dense matrices, especially if they fit into GPU RAM are completely different story though...

sarimak avatar Feb 23 '21 20:02 sarimak