hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Add new argument for limiting the maximum epsilon

Open prodrigues-tdx opened this issue 3 years ago • 3 comments

This PR aims to introduce to HDBSCAN an argument for a max threshold to the epsilon used when picking the best clusters. With this PR we allow for this new argument, cluster_selection_epsilon_max, to be used in the EOM search method.

This is very useful for cases where you know from the get go that your samples should not be very far from each other, because you have some domain knowledge.

For this implementation, we use cluster_selection_epsilon_max in a very similar way to max_cluster_size. This way the clusters with an epsilon bigger than cluster_selection_epsilon_max can still appear if there are no valid clusters bellow that epsilon. This is, in fact, the exact same behavior as max_cluster_size.

prodrigues-tdx avatar Feb 22 '22 13:02 prodrigues-tdx

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

lmcinnes avatar Apr 26 '22 14:04 lmcinnes

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

I totally missed your comment:s I'll do that yes.

prodrigues-tdx avatar May 09 '22 22:05 prodrigues-tdx