hdbscan
hdbscan copied to clipboard
[Question/Feature Request] Re-using single linkage tree from previous clustering to adjust other parameters
Hello,
I have a few questions regarding a potential use case, where I want to keep min_samples the same and see how the clustering result differs visually when we adjust other parameters (such as the min_cluster_size and cluster_selection_epsilon). Hence, I have the following question/request:
- It appears that the construction of the single-linkage tree is the most expensive part of the algorithm, and the extraction of the clusters is a relatively cheaper step. Am I correct?
- Also, the single-linkage tree doesn't depend on the
min_cluster_sizeandcluster_selection_epsilon. Am I correct? - If both the above are true, would it be possible to have a public API to re-do the assignment of cluster labels post
.fitso that users can vary themin_cluster_sizeandcluster_selection_epsilonparameters and quickly see results?
There is a parameter memory that uses joblib memory caching to save off the single linkage tree computation. As long as you don't change the min_samples then calls that specify the same memory file location should be very cheap (relatively).