hdbscan
hdbscan copied to clipboard
HDBSCAN on GPU?
Hi guys,
Will there be any plan to develop a GPU enabled version of this amazing algorithm/package?
Unfortunately I have to admit that I'm new to GPU so won't be able to contribute anytime soon...
Thanks.
I'm not really a GPU programmer either, so it isn't on my immediate horizon. If I ever get around to writing an experimental version using numba instead of cython I will try to ensure that it can accept GPUs as a target architecture, but I really don't have any immediate plans to embark on that.
Noted, thanks for getting back. Thought it was worth asking anyway. Perhaps others would be able to pick this up in the future.
I would be happy to support any efforts -- I'm just not in a position to make them myself right now. Thanks for the suggestion though.
@lmcinnes I do some stuff with high level CNNs with Pytorch (best library for GPU + CNNs). I found your library and education materials in the docs amazing.
Probably if there is some step / function in HDBSCAN pipeline that involves heavy matrix multiplication (to be honest - this is the best that CNN libraries can do and I am familiar with) - I could lend a hand in porting that to Pytorch.
Thanks for the offer. Unfortunately current approaches to the algorithm do not lend themselves well to linear algebra expressibility.
Alas (
We are interested in doing this on RAPIDS cuML at some point. We're not sure when, but it's on our radar.
@cjnolet any update on this?
@denfromufa RAPIDS 21.10 now contains a GPU-accelerated HDBSCAN, which makes use of the great work done by Leland McInnes and John Healy in this project!
https://developer.nvidia.com/blog/gpu-accelerated-hierarchical-dbscan-with-rapids-cuml-lets-get-back-to-the-future/
It took awhile to develop but we managed to gain a few other algorithms as building blocks along the way, namely RAPIDS now contains a minimum spanning tree and single-linkage hierarchical clustering in addition to HDBSCAN.
@cjnolet is cuml.experimental usually available on kaggle ?
@lcrmorin the gists in the blog were incorrect. We moved HDBSCAN out of the experimental packages for 21.10 so it should now be cuml.cluster.HDBSCAN. I've fixed the gists in the blog so they should be correct now.