evalml icon indicating copy to clipboard operation
evalml copied to clipboard

AutoMLSearchException: All pipelines in the current AutoML batch produced a score of np.nan on the primary objective

Open glacierck opened this issue 2 years ago • 3 comments

I just put the problem_ Type="binary" becomes "multiclass"


  • Beginning pipeline search *

Optimizing for Log Loss Multiclass. Lower score is better.

Using SequentialEngine to train and score pipelines. Searching up to 3 batches for a total of None pipelines. Allowed model families:

Evaluating Baseline Pipeline: Mode Baseline Multiclass Classification Pipeline Mode Baseline Multiclass Classification Pipeline fold 0: Encountered an error. Mode Baseline Multiclass Classification Pipeline fold 0: All scores will be replaced with nan. Fold 0: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes! Fold 0: Parameters: {'Label Encoder': {'positive_label': None}, 'Baseline Classifier': {'strategy': 'mode'}} Fold 0: Traceback: File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Mode Baseline Multiclass Classification Pipeline fold 1: Encountered an error.
		Mode Baseline Multiclass Classification Pipeline fold 1: All scores will be replaced with nan.
		Fold 1: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 1: Parameters:
{'Label Encoder': {'positive_label': None}, 'Baseline Classifier': {'strategy': 'mode'}}
		Fold 1: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Mode Baseline Multiclass Classification Pipeline fold 2: Encountered an error.
		Mode Baseline Multiclass Classification Pipeline fold 2: All scores will be replaced with nan.
		Fold 2: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 2: Parameters:
{'Label Encoder': {'positive_label': None}, 'Baseline Classifier': {'strategy': 'mode'}}
		Fold 2: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

Mode Baseline Multiclass Classification Pipeline: Starting cross validation Finished cross validation - mean Log Loss Multiclass: nan


  • Evaluating Batch Number 1 *

		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 0: Encountered an error.
		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 0: All scores will be replaced with nan.
		Fold 0: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 0: Parameters:
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}
		Fold 0: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 1: Encountered an error.
		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 1: All scores will be replaced with nan.
		Fold 1: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 1: Parameters:
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}
		Fold 1: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 2: Encountered an error.
		Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler fold 2: All scores will be replaced with nan.
		Fold 2: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 2: Parameters:
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}
		Fold 2: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder + Standard Scaler: Starting cross validation Finished cross validation - mean Log Loss Multiclass: nan Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 0: Encountered an error. Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 0: All scores will be replaced with nan. Fold 0: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes! Fold 0: Parameters: {'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}} Fold 0: Traceback: File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 1: Encountered an error.
		Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 1: All scores will be replaced with nan.
		Fold 1: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 1: Parameters:
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
		Fold 1: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

		Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 2: Encountered an error.
		Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder fold 2: All scores will be replaced with nan.
		Fold 2: Exception during automl search: Multiclass pipelines require y to have 3 or more unique classes!
		Fold 2: Parameters:
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
		Fold 2: Traceback:

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 238, in _train_and_score fitted_pipeline, hashes = train_pipeline(

File "D:\conda\envs\gradio\lib\site-packages\evalml\automl\engine\engine_base.py", line 176, in train_pipeline cv_pipeline.fit(X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\utils\base_meta.py", line 19, in _set_fit return_value = method(self, X, y)

File "D:\conda\envs\gradio\lib\site-packages\evalml\pipelines\classification_pipeline.py", line 66, in fit raise ValueError(

Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + One Hot Encoder: Starting cross validation Finished cross validation - mean Log Loss Multiclass: nan

AutoMLSearchException Traceback (most recent call last) Cell In [20], line 1 ----> 1 automl.search(interactive_plot=False)

File D:\conda\envs\gradio\lib\site-packages\evalml\automl\automl_search.py:1159, in AutoMLSearch.search(self, interactive_plot) 1152 if ( 1153 len(current_batch_pipeline_scores) 1154 and current_batch_pipeline_scores.isna().all() 1155 ): 1156 error_msgs = set( 1157 [str(pl_fold["Exception"]) for pl_fold in self.errors.values()], 1158 ) -> 1159 raise AutoMLSearchException( 1160 f"All pipelines in the current AutoML batch produced a score of np.nan on the primary objective {self.objective}. Exception(s) raised: {error_msgs}. Check the 'errors' attribute of the AutoMLSearch object for a full breakdown of errors and tracebacks.", 1161 ) 1162 if len(pipeline_times) > 0: 1163 pipeline_times["Total time of batch"] = time_elapsed(start_batch_time)

AutoMLSearchException: All pipelines in the current AutoML batch produced a score of np.nan on the primary objective <evalml.objectives.standard_metrics.LogLossMulticlass object at 0x00000274FC966D00>. Exception(s) raised: {'Multiclass pipelines require y to have 3 or more unique classes!'}. Check the 'errors' attribute of the AutoMLSearch object for a full breakdown of errors and tracebacks.

# Your code here
y_train.dtypes

CategoricalDtype(categories=['<=5%', '>5%'], ordered=False)


automl = AutoMLSearch(
    X_train=X_train,
    y_train=y_train,
    problem_type="multiclass",
    verbose=True,
)
automl.search(interactive_plot=False)

glacierck avatar Sep 29 '22 03:09 glacierck

I have encountered a similar problem. Have you solved it?

MaeChd avatar Nov 17 '22 03:11 MaeChd

@glacierck Do you have some data for us to repro this?

chukarsten avatar Apr 06 '23 15:04 chukarsten

I faced similar issue on the attached dataset depending on the train-test split random-state number. It works only when I use random_state=42 on my system. I tried it on Google Colab and didn't work at all. covid_flu.csv

from evalml import AutoMLSearch
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split


data_path = 'data/covid_flu.csv'
target_column = 'Diagnosis'
data = pd.read_csv(data_path).dropna(subset=target_column)

problem_type = 'binary' 
y = data[target_column]
X = data.drop(columns=target_column)

# Changing the random state to 42 will not give an error on my system
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1, stratify=y)

# Impute numeric columns
num_imputer = SimpleImputer().set_output(transform='pandas')
num_cols = X_train.select_dtypes(include='number').columns
if len(num_cols)>0:
    X_train[num_cols] = num_imputer.fit_transform(X_train[num_cols])
    X_test[num_cols] = num_imputer.transform(X_test[num_cols])
# Impute categorical columns
cat_cols = X_train.select_dtypes(exclude='number').columns
if len(cat_cols)>0:
    imp_mostfrequent = SimpleImputer(strategy='most_frequent')
    print(cat_cols)
    X_train[cat_cols] = imp_mostfrequent.fit_transform(X_train[cat_cols])
    X_test[cat_cols] = imp_mostfrequent.transform(X_test[cat_cols])


automl = AutoMLSearch(
    X_train=X_train,
    y_train=y_train,
    X_holdout=X_test,
    y_holdout=y_test,
    problem_type=problem_type,
    objective='auto',
    additional_objectives='f1',
    allowed_model_families=["extra_trees", "linear_model", "random_forest", "lightgbm"],
    max_batches=3,
    automl_algorithm="default",
    ensembling=True,
    verbose=False,
)

automl.search(interactive_plot=False)

Thanks for your help!

zerualem avatar Nov 01 '23 16:11 zerualem