cuml
cuml copied to clipboard
[FEA] Support for HDBSCAN relative_validity
The CPU hdbscan library provides a relative_validity_
attribute, which is a "fast approximation of the Density Based Cluster Validity (DBCV) score".
It would be nice to support this in cuML, even if we ran this on CPU (it's fairly lightweight)
from sklearn.datasets import make_blobs
import cuml
import hdbscan # !pip install hdbscan
N = 1000
K = 10
X, y = make_blobs(
n_samples=N,
n_features=K,
random_state=12
)
clusterer = hdbscan.HDBSCAN(gen_min_span_tree=True)
clusterer.fit(X)
print(clusterer.relative_validity_)
0.8025289855360727
Similar to https://github.com/rapidsai/cuml/issues/4497