yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Density based clustering validity measures

Open lmcinnes opened this issue 8 years ago • 2 comments

Clustering scores like silhouette work well for K-Means but make less sense for density based clustering techniques like DBSCAN which support arbitrary cluster shapes. It would be nice to include scores and visualisation for measures that support density based notions of clustering. These are a a little thin on the ground, but the Density Based Cluster Validity Index of Moulavi et al (http://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf) is one of the better ones.

lmcinnes avatar May 21 '17 18:05 lmcinnes

Absolutely that would be awesome -- we had to add our own distortion score metric, would you be willing to write up some Python to compute the cluster validity index? Check out distortion_score for signature and input.

bbengfort avatar May 22 '17 20:05 bbengfort

I have some code for it here. It has some dependency on hdbscan, but in practice that amounts to the mst_linkage_core, which you can replace with any suitable minimum spanning tree code.

lmcinnes avatar May 23 '17 00:05 lmcinnes