hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Straightforward way to assign every noise sample to its most likely cluster?

Open Asquator opened this issue 1 year ago • 1 comments

My application requires total clustering of all data samples, and I would like to assign all outliers to their adjacent clusters (the dataset is very noisy, and after tweaking the two parameters, at least 1/4 of the samples are marked as outliers).

I want to benefit from the advantages of density-based clustering, but also make deterministic decision based on every point's (approximate) cluster.

It seems we just need to assign every outlier to its closest core point's cluster, what is the easiest way to do it?

Asquator avatar Jun 10 '24 02:06 Asquator

You can try the soft clustering options: https://hdbscan.readthedocs.io/en/latest/soft_clustering.html but there really isn't a magical straightforward way to do this.

On Sun, Jun 9, 2024 at 10:23 PM Asquator @.***> wrote:

My application requires total clustering of all data samples, and I would like to assign all outliers to their adjacent clusters (the dataset is very noisy, and after tweaking the two parameters, at least 1/4 of the samples are marked as outliers).

I want to benefit from the advantages of density-based clustering, but also make deterministic decision based on every point's (approximate) cluster.

It seems we just need to assign every outlier to its closest core point's cluster, what is the easiest way to do it?

— Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IUBP4B5ZPAAXHIFL5J3LZGUE3XAVCNFSM6AAAAABJBNMBQOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DENRZG42DAOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lmcinnes avatar Jun 10 '24 02:06 lmcinnes