hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

How to get sub-clusters or super-cluster of a cluster?

Open JunFang-NWPU opened this issue 4 years ago • 5 comments

Hi,

I obtained some clusters using hdbscan. Some clusters contain too many points, and some contain too little. I know I can find sub-clusters or super-cluster of a cluster from its hierarchy (e.g., condense_tree), but it seems it is not an easy task.

Can someone show how to achieve it? Input a cluster label, and found its sub-clusters and super-cluster.

Thanks.

JunFang-NWPU avatar Jul 11 '20 03:07 JunFang-NWPU

You need to get a mapping from cluster labels as output to ids in the condensed tree. From there is is just a matter of following the tree (finding a parent that has this node as a child, or looking at the children on this node). If you look through the code of get_clusters in hdbscan.plots you can see one approach to getting this sort of label mapping.

lmcinnes avatar Jul 11 '20 14:07 lmcinnes

Yes, the key is the mapping from labesl to ids in the condensed tree. Thanks. I will give a try.

JunFang-NWPU avatar Jul 12 '20 03:07 JunFang-NWPU

@lmcinnes would you please be able to share a small snippet that does this? it would be super helpful! 🙏🏽

there are a few open issues regarding this cluster mapping and sub cluster topic - https://github.com/scikit-learn-contrib/hdbscan/issues/451, https://github.com/scikit-learn-contrib/hdbscan/issues/442

salman1993 avatar Jan 22 '21 17:01 salman1993