hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

HDBSCAN on GPU?

Open esvhd opened this issue 7 years ago • 11 comments

Hi guys,

Will there be any plan to develop a GPU enabled version of this amazing algorithm/package?

Unfortunately I have to admit that I'm new to GPU so won't be able to contribute anytime soon...

Thanks.

esvhd avatar May 05 '17 20:05 esvhd

I'm not really a GPU programmer either, so it isn't on my immediate horizon. If I ever get around to writing an experimental version using numba instead of cython I will try to ensure that it can accept GPUs as a target architecture, but I really don't have any immediate plans to embark on that.

lmcinnes avatar May 05 '17 22:05 lmcinnes

Noted, thanks for getting back. Thought it was worth asking anyway. Perhaps others would be able to pick this up in the future.

esvhd avatar May 05 '17 22:05 esvhd

I would be happy to support any efforts -- I'm just not in a position to make them myself right now. Thanks for the suggestion though.

lmcinnes avatar May 05 '17 23:05 lmcinnes

@lmcinnes I do some stuff with high level CNNs with Pytorch (best library for GPU + CNNs). I found your library and education materials in the docs amazing.

Probably if there is some step / function in HDBSCAN pipeline that involves heavy matrix multiplication (to be honest - this is the best that CNN libraries can do and I am familiar with) - I could lend a hand in porting that to Pytorch.

snakers4 avatar Jan 25 '18 04:01 snakers4

Thanks for the offer. Unfortunately current approaches to the algorithm do not lend themselves well to linear algebra expressibility.

lmcinnes avatar Jan 25 '18 14:01 lmcinnes

Alas (

snakers4 avatar Jan 25 '18 15:01 snakers4

We are interested in doing this on RAPIDS cuML at some point. We're not sure when, but it's on our radar.

cjnolet avatar May 01 '20 23:05 cjnolet

@cjnolet any update on this?

den-run-ai avatar Feb 08 '21 17:02 den-run-ai

@denfromufa RAPIDS 21.10 now contains a GPU-accelerated HDBSCAN, which makes use of the great work done by Leland McInnes and John Healy in this project!

https://developer.nvidia.com/blog/gpu-accelerated-hierarchical-dbscan-with-rapids-cuml-lets-get-back-to-the-future/

It took awhile to develop but we managed to gain a few other algorithms as building blocks along the way, namely RAPIDS now contains a minimum spanning tree and single-linkage hierarchical clustering in addition to HDBSCAN.

cjnolet avatar Oct 09 '21 13:10 cjnolet

@cjnolet is cuml.experimental usually available on kaggle ?

lcrmorin avatar Oct 10 '21 02:10 lcrmorin

@lcrmorin the gists in the blog were incorrect. We moved HDBSCAN out of the experimental packages for 21.10 so it should now be cuml.cluster.HDBSCAN. I've fixed the gists in the blog so they should be correct now.

cjnolet avatar Oct 11 '21 19:10 cjnolet