hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

`allow_single_cluster=True` gets overwritten for `cluster_selection_method=leaf`

Open divyegala opened this issue 4 years ago • 3 comments

If the cluster tree is empty and allow_single_cluster=True, root is chosen as a cluster here https://github.com/scikit-learn-contrib/hdbscan/blob/master/hdbscan/_hdbscan_tree.pyx#L780

Now, if cluster_selection_epsilon == 0.0, the selected clusters get overwritten to be the empty leaves here https://github.com/scikit-learn-contrib/hdbscan/blob/4052692af994610adc9f72486a47c905dd527c94/hdbscan/_hdbscan_tree.pyx#L785

Finally, the root ends up getting deselected here https://github.com/scikit-learn-contrib/hdbscan/blob/4052692af994610adc9f72486a47c905dd527c94/hdbscan/_hdbscan_tree.pyx#L788

Is this intended behavior, or should root continue to be a cluster in this case?

divyegala avatar May 17 '21 22:05 divyegala

I believe this is a bug, and not the intended behaviour.

lmcinnes avatar May 18 '21 14:05 lmcinnes

@lmcinnes what is the expected behavior? I'm happy to submit a patch/fix

divyegala avatar May 18 '21 17:05 divyegala

If allow_single_cluster is true then presuming then the root should not be removed as a candidate; if there is simply a single cluster that slowly decays (e.g. root. == leaf) then it should return the same result as if the "eom" method selected the root cluster. A patch would be most welcome! Thank you!

lmcinnes avatar May 20 '21 15:05 lmcinnes