hdbscan
hdbscan copied to clipboard
How to get sub-clusters or super-cluster of a cluster?
Hi,
I obtained some clusters using hdbscan. Some clusters contain too many points, and some contain too little. I know I can find sub-clusters or super-cluster of a cluster from its hierarchy (e.g., condense_tree), but it seems it is not an easy task.
Can someone show how to achieve it? Input a cluster label, and found its sub-clusters and super-cluster.
Thanks.
You need to get a mapping from cluster labels as output to ids in the condensed tree. From there is is just a matter of following the tree (finding a parent that has this node as a child, or looking at the children on this node). If you look through the code of get_clusters
in hdbscan.plots
you can see one approach to getting this sort of label mapping.
Yes, the key is the mapping from labesl to ids in the condensed tree. Thanks. I will give a try.
@lmcinnes would you please be able to share a small snippet that does this? it would be super helpful! 🙏🏽
there are a few open issues regarding this cluster mapping and sub cluster topic - https://github.com/scikit-learn-contrib/hdbscan/issues/451, https://github.com/scikit-learn-contrib/hdbscan/issues/442