cuml [FEA] Support for HDBSCAN relative

[FEA] Support for HDBSCAN relative_validity

Open beckernick opened this issue 7 months ago • 8 comments

The CPU hdbscan library provides a relative_validity_ attribute, which is a "fast approximation of the Density Based Cluster Validity (DBCV) score".

It would be nice to support this in cuML, even if we ran this on CPU (it's fairly lightweight)

from sklearn.datasets import make_blobs
import cuml
import hdbscan # !pip install hdbscan

N = 1000
K = 10

X, y = make_blobs(
    n_samples=N,
    n_features=K,
    random_state=12
)

clusterer = hdbscan.HDBSCAN(gen_min_span_tree=True)
clusterer.fit(X)
print(clusterer.relative_validity_)
0.8025289855360727

Similar to https://github.com/rapidsai/cuml/issues/4497

Jun 24 '24 20:06 beckernick

cuml cuml copied to clipboard

[FEA] Support for HDBSCAN relative_validity

cuml
cuml copied to clipboard