hdbscan copied to clipboard
RandomizedSearchCV: All estimators failed to fit hdbscan
I have done clustering using hdbscan, everything is working. I wanted to do evaluation/validation of the clusters now with hyperparameter tuning with the following code: The matrix passed is a dissimilarity matrix already computed with a metric not present in HDBSCAN, that is the reason why we have it precomputed.
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import make_scorer
import logging
model2 = hdbscan.HDBSCAN(metric='precomputed').fit(mp_matrix)
param_dist = {'min_samples': [1,2,3],#,14,21,28,35,70],
'cluster_selection_method' : ['eom','leaf']
validity_scorer = make_scorer(hdbscan.validity.validity_index, greater_is_better=True)
SEED = 42
n_iter_search = 20
random_search = RandomizedSearchCV(model2, param_distributions=param_dist, n_iter=n_iter_search, scoring=validity_scorer, random_state=SEED)
print(f"Best Parameters {random_search.best_params_}")
and I get the following traceback
NotFittedError Traceback (most recent call last)
Input In [78], in <cell line: 18>()
16 n_iter_search = 20
17 random_search = RandomizedSearchCV(model2, param_distributions=param_dist, n_iter=n_iter_search, scoring=validity_scorer, random_state=SEED)
---> 18 random_search.fit(mp_matrix)
File D:\Other\Apps\Anaconda\Lib\site-packages\sklearn\model_selection\_search.py:891, in BaseSearchCV.fit(self, X, y, groups, **fit_params)
885 results = self._format_results(
886 all_candidate_params, n_splits, all_out, all_more_results
887 )
889 return results
--> 891 self._run_search(evaluate_candidates)
893 # multimetric is determined here because in the case of a callable
894 # self.scoring the return type is only known after calling
895 first_test_score = all_out[0]["test_scores"]
File D:\Other\Apps\Anaconda\Lib\site-packages\sklearn\model_selection\_search.py:1766, in RandomizedSearchCV._run_search(self, evaluate_candidates)
1764 def _run_search(self, evaluate_candidates):
1765 """Search n_iter candidates from param_distributions"""
-> 1766 evaluate_candidates(
1767 ParameterSampler(
1768 self.param_distributions, self.n_iter, random_state=self.random_state
1769 )
1770 )
File D:\Other\Apps\Anaconda\Lib\site-packages\sklearn\model_selection\_search.py:875, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results)
870 # For callable self.scoring, the return type is only know after
871 # calling. If the return type is a dictionary, the error scores
872 # can now be inserted with the correct key. The type checking
873 # of out will be done in `_insert_error_scores`.
874 if callable(self.scoring):
--> 875 _insert_error_scores(out, self.error_score)
877 all_candidate_params.extend(candidate_params)
878 all_out.extend(out)
File D:\Other\Apps\Anaconda\Lib\site-packages\sklearn\model_selection\_validation.py:331, in _insert_error_scores(results, error_score)
328 successful_score = result["test_scores"]
330 if successful_score is None:
--> 331 raise NotFittedError("All estimators failed to fit")
333 if isinstance(successful_score, dict):
334 formatted_error = {name: error_score for name in successful_score}
NotFittedError: All estimators failed to fit
and I don't have a single idea where the problem lies, can you help please?
Note: When I remove the metric='precomputed'
and make instead gen_min_span_tree=True
which works metrics other than those computed manually, I get no problem. So why is that, and how to make the code work for the already computed metric.