hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

DBSCAN hybrid mode with the cluster_selection_epsilon does not support soft clustering on out of sample data

Open Dicksonchin93 opened this issue 3 years ago • 2 comments

DBSCAN hybrid mode with the cluster_selection_epsilon parameter set to a value more than 0 does not support soft clustering on out of sample data

We don't utilise cluster_selection_epsilon anywhere in the membership_vector method in https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/prediction.py#L518

The suggested part to add support for that is to add the same logic during fitting with cluster_selection_epsilon is in the select_clusters method used here https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/prediction.py#L550

https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/plots.py#L234

Dicksonchin93 avatar Feb 09 '22 10:02 Dicksonchin93

Yes, I believe this is an interaction of features that is not going to manage to work. Sorry.

lmcinnes avatar Feb 09 '22 13:02 lmcinnes

i'll be happy to make a PR if you will be able to review it once it is done, should I do that?

Dicksonchin93 avatar Feb 09 '22 14:02 Dicksonchin93