auto-sklearn
auto-sklearn copied to clipboard
Multi-label Classification Scoring "NaN"
I'm trying to find out the metrics for my multi-label classification Iris dataset, and I'm getting a "NaN" for precision, recall, f1, and roc_auc.
Below is my code:
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=200,
per_run_time_limit=30,
scoring_functions=[precision, recall, f1, roc_auc],
# Bellow two flags are provided to speed up calculations
# Not recommended for a real implementation
initial_configurations_via_metalearning=0,
smac_scenario_args={'runcount_limit': 1},
)
automl.fit(X_train, y_train)
def get_metric_result(cv_results):
results = pd.DataFrame.from_dict(cv_results)
results = results[results['status'] == "Success"]
cols = ['rank_test_scores', 'param_classifier:__choice__', 'mean_test_score']
cols.extend([key for key in cv_results.keys() if key.startswith('metric_')])
return results[cols]
Hi @timzewing,
I can confirm this happens and I'll be able to look into it soon, thanks for reporting this to us and apologies for the inconvenience.
Fully reproducible example:
import pandas as pd
from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.util import get_dataset
from autosklearn.metrics import precision, recall, f1, roc_auc
X_train, y_train, X_test, y_test = get_dataset('iris')
automl = AutoSklearnClassifier(
time_left_for_this_task=30,
scoring_functions=[precision, recall, f1, roc_auc],
# Bellow two flags are provided to speed up calculations
# Not recommended for a real implementation
initial_configurations_via_metalearning=0,
smac_scenario_args={'runcount_limit': 1}
)
automl.fit(X_train, y_train)
def get_metric_result(cv_results):
results = pd.DataFrame.from_dict(cv_results)
results = results[results['status'] == "Success"]
cols = ['rank_test_scores', 'param_classifier:__choice__', 'mean_test_score']
cols.extend([key for key in cv_results.keys() if key.startswith('metric_')])
return results[cols]
results = get_metric_result(automl.cv_results_)
metric_columns = ['metric_precision','metric_recall','metric_f1','metric_roc_auc']
assert all(not results[col].isnull().values.any() for col in metric_columns), \
f"{results[metric_columns]}"
Notes: the use of smac_scenario_args has no effect and increasing it so multiple models are found also has no effect
Hello, please kindly help me with these lines of code. I have cleaned, and properly preprocessed my dataset, it has worked well in conventional machine learning but when executing output, the results in AutoML give NaN in all columns.
Question.docx df_cv_results = pd.DataFrame(clf.cv_results_).sort_values(by = 'mean_test_score', ascending = False) df_cv_results
This gives NaN in all columns. what is wrong with this?
skf = StratifiedKFold(n_splits=5) data = totaset.values X, y = data[:, :-1], data[:, -1]
minimally prepare dataset
X = X.astype('float32') y = LabelEncoder().fit_transform(y.astype('str'))
split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
clf = AutoSklearnClassifier(time_left_for_this_task=5*60, max_models_on_disc=5, memory_limit = 10240, resampling_strategy=skf, ensemble_size = 4, metric = average_precision, scoring_functions=[roc_auc, average_precision, accuracy, f1, precision, recall, log_loss]) clf.fit(X_train, y_train)
df_cv_results = pd.DataFrame(clf.cv_results_).sort_values(by = 'mean_test_score', ascending = False) df_cv_results
I think I have a lead on this while investigating other things. It won't be fixed immediately but leaving this as a note.
The EnsembleSelector performs a weighted sum of the models predictions.
- In the case of regression this is fine.
- In the case of classification, this only makes sense with probabilities. Doing a weighted average of classes in
(0,n)ormulti-labelwon't do as intended.
Given that fact, an EnsembleSelector for classification must receive the probability_predictions, the labels and a metric.
- The
labelsare encoded and we use the same transformation for both models and thelabelsgiven here, hence we can guarantee they are in(0,n)ormulti-labeland correspond to the arrangement inprobability_predictions. - However, turns out most
sklearnmetrics don't really acceptprobabilitiesin their signature, they require the probabilities to be converted to labels first. One exception of course isroc_auc_scorewhich is the default we use. For example, check out sklearnaccuracy, they rely onpredictionsand notprobabilities
So I would expect that using any other metric other than roc_auc_score ends up with these NaN scores, or even worse, failing silently.
The way forward I see is to explicitly treat classification or regression separately. Before the score of adding the classifier to the ensemble, the probabilities need to be weighted summed and then inverse transformed to allow for all metric types.
Turns out this was not the issue and these probabilities are convert properly before hitting the metric. I will keep an eye on it.