auto-sklearn icon indicating copy to clipboard operation
auto-sklearn copied to clipboard

[Question] My Question?

Open Estefano13 opened this issue 1 year ago • 0 comments

I'm trying to produce learning curves for a couple of models. the models were initially trained and saved using pickle. After loading and refit on the whole subset, I experience no issues. However, when attempting to refit the models on training data subsets, the refit process fails due to: ValueError: SelectClassificationRates removed all features.

My data is an array of floats with no negative values. Because I am trying to get scores at different training set sizes, the training subsets sometimes have more features than samples (Could this be responsible for the problem?). The model was originally fitted on the whole training set using GroupKFold as cv strategy.

I've observed that this issue somewhat depends on the subset being trained on, with a bias toward failing toward smaller subsets. Here is the error output:

ValueError Traceback (most recent call last) in <cell line: 1>() ----> 1 learning_curve_AutoML(my_model_, X_train.to_numpy(), y_train, groups = X_amplitude_train["Sample ID"], cv=5, scoring = "balanced_accuracy", train_sizes=np.linspace(.3, 1.0, 5))

11 frames in learning_curve_AutoML(estimator_, X, y, groups, train_sizes, scoring, cv, ret_data) 36 37 print(X_train_0.shape) ---> 38 estimator_.refit(X_train_0, y_train_0) 39 40 train_scores = score(estimator, X_train_0, y_train_0, scorer)

/usr/local/lib/python3.10/dist-packages/autosklearn/estimators.py in refit(self, X, y) 792 793 """ --> 794 self.automl_.refit(X, y) 795 return self 796

/usr/local/lib/python3.10/dist-packages/autosklearn/automl.py in refit(self, X, y, max_reshuffles) 1214 1215 if i == (max_reshuffles - 1): -> 1216 raise e 1217 1218 self._can_predict = True

/usr/local/lib/python3.10/dist-packages/autosklearn/automl.py in refit(self, X, y, max_reshuffles) 1194 try: 1195 if self._budget_type is None: -> 1196 _fit_and_suppress_warnings(self._logger, model, X, y) 1197 else: 1198 _fit_with_budget(

/usr/local/lib/python3.10/dist-packages/autosklearn/evaluation/abstract_evaluator.py in _fit_and_suppress_warnings(logger, model, X, y) 186 with warnings.catch_warnings(): 187 warnings.showwarning = send_warnings_to_log --> 188 model.fit(X, y) 189 190 return model

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/base.py in fit(self, X, y, **fit_params) 122 a classification algorithm first. 123 """ --> 124 X, fit_params = self.fit_transformer(X, y, **fit_params) 125 self.fit_estimator(X, y, **fit_params) 126 return self

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/classification.py in fit_transformer(self, X, y, fit_params) 121 fit_params.update(_fit_params) 122 --> 123 X, fit_params = super().fit_transformer(X, y, fit_params=fit_params) 124 125 return X, fit_params

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/base.py in fit_transformer(self, X, y, fit_params) 134 } 135 fit_params_steps = self._check_fit_params(**fit_params) --> 136 Xt = self._fit(X, y, **fit_params_steps) 137 return Xt, fit_params_steps[self.steps[-1][0]] 138

/usr/local/lib/python3.10/dist-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps) 301 cloned_transformer = clone(transformer) 302 # Fit or load from cache the current transformer --> 303 X, fitted_transformer = fit_transform_one_cached( 304 cloned_transformer, X, y, None, 305 message_clsname='Pipeline',

/usr/local/lib/python3.10/dist-packages/joblib/memory.py in call(self, *args, **kwargs) 310 311 def call(self, *args, **kwargs): --> 312 return self.func(*args, **kwargs) 313 314 def call_and_shelve(self, *args, **kwargs):

/usr/local/lib/python3.10/dist-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params) 754 res = transformer.fit_transform(X, y, **fit_params) 755 else: --> 756 res = transformer.fit(X, y, **fit_params).transform(X) 757 758 if weight is None:

/usr/local/lib/python3.10/dist-packages/autosklearn/pipeline/components/feature_preprocessing/select_rates_classification.py in transform(self, X) 94 95 if Xt.shape[1] == 0: ---> 96 raise ValueError("%s removed all features." % self.class.name) 97 return Xt 98

ValueError: SelectClassificationRates removed all features.

System Details

I am running this on Google Colab

python version: 3.10.12 Autosklearn version = 0.15.0 sklearn version = 0.24.2

Is this working as intended? Any suggestions as to how to avoid this problem in the future? Should I just exclude the problematic feature preprocessing step?

Estefano13 avatar Aug 22 '24 12:08 Estefano13