hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

[Question/Feature Request] Re-using single linkage tree from previous clustering to adjust other parameters

Open ramanan-subramanian opened this issue 3 years ago • 1 comments

Hello,

I have a few questions regarding a potential use case, where I want to keep min_samples the same and see how the clustering result differs visually when we adjust other parameters (such as the min_cluster_size and cluster_selection_epsilon). Hence, I have the following question/request:

  1. It appears that the construction of the single-linkage tree is the most expensive part of the algorithm, and the extraction of the clusters is a relatively cheaper step. Am I correct?
  2. Also, the single-linkage tree doesn't depend on the min_cluster_size and cluster_selection_epsilon. Am I correct?
  3. If both the above are true, would it be possible to have a public API to re-do the assignment of cluster labels post .fit so that users can vary the min_cluster_size and cluster_selection_epsilon parameters and quickly see results?

ramanan-subramanian avatar Feb 28 '22 04:02 ramanan-subramanian

There is a parameter memory that uses joblib memory caching to save off the single linkage tree computation. As long as you don't change the min_samples then calls that specify the same memory file location should be very cheap (relatively).

lmcinnes avatar Feb 28 '22 14:02 lmcinnes