I'm trying to find out the metrics for my multi-label classification Iris dataset, and I'm getting a "NaN" for precision, recall, f1, and roc_auc.

Below is my code:

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=200,
    per_run_time_limit=30,
    scoring_functions=[precision, recall, f1, roc_auc],
    # Bellow two flags are provided to speed up calculations
    # Not recommended for a real implementation
    initial_configurations_via_metalearning=0,
    smac_scenario_args={'runcount_limit': 1},
)
automl.fit(X_train, y_train)


def get_metric_result(cv_results):
    results = pd.DataFrame.from_dict(cv_results)
    results = results[results['status'] == "Success"]
    cols = ['rank_test_scores', 'param_classifier:__choice__', 'mean_test_score']
    cols.extend([key for key in cv_results.keys() if key.startswith('metric_')])
    return results[cols]

Sep 27 '21 19:09 timzewing

Hi @timzewing,

I can confirm this happens and I'll be able to look into it soon, thanks for reporting this to us and apologies for the inconvenience.

Fully reproducible example:

import pandas as pd

from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.util import get_dataset
from autosklearn.metrics import precision, recall, f1, roc_auc

X_train, y_train, X_test, y_test = get_dataset('iris')

automl = AutoSklearnClassifier(
    time_left_for_this_task=30,
    scoring_functions=[precision, recall, f1, roc_auc],
    # Bellow two flags are provided to speed up calculations
    # Not recommended for a real implementation
    initial_configurations_via_metalearning=0,
    smac_scenario_args={'runcount_limit': 1}
)
automl.fit(X_train, y_train)

def get_metric_result(cv_results):
    results = pd.DataFrame.from_dict(cv_results)
    results = results[results['status'] == "Success"]
    cols = ['rank_test_scores', 'param_classifier:__choice__', 'mean_test_score']
    cols.extend([key for key in cv_results.keys() if key.startswith('metric_')])
    return results[cols]


results = get_metric_result(automl.cv_results_)

metric_columns = ['metric_precision','metric_recall','metric_f1','metric_roc_auc']

assert all(not results[col].isnull().values.any() for col in metric_columns), \
    f"{results[metric_columns]}"

Notes: the use of smac_scenario_args has no effect and increasing it so multiple models are found also has no effect

Sep 28 '21 17:09 eddiebergman

Hello, please kindly help me with these lines of code. I have cleaned, and properly preprocessed my dataset, it has worked well in conventional machine learning but when executing output, the results in AutoML give NaN in all columns.

Question.docx df_cv_results = pd.DataFrame(clf.cv_results_).sort_values(by = 'mean_test_score', ascending = False) df_cv_results

Dec 15 '21 14:12 Yuda2015

This gives NaN in all columns. what is wrong with this?

skf = StratifiedKFold(n_splits=5) data = totaset.values X, y = data[:, :-1], data[:, -1]

minimally prepare dataset

X = X.astype('float32') y = LabelEncoder().fit_transform(y.astype('str'))

split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

clf = AutoSklearnClassifier(time_left_for_this_task=5*60, max_models_on_disc=5, memory_limit = 10240, resampling_strategy=skf, ensemble_size = 4, metric = average_precision, scoring_functions=[roc_auc, average_precision, accuracy, f1, precision, recall, log_loss]) clf.fit(X_train, y_train)

df_cv_results = pd.DataFrame(clf.cv_results_).sort_values(by = 'mean_test_score', ascending = False) df_cv_results

Dec 15 '21 15:12 Yuda2015

I think I have a lead on this while investigating other things. It won't be fixed immediately but leaving this as a note.

The EnsembleSelector performs a weighted sum of the models predictions.

In the case of regression this is fine.
In the case of classification, this only makes sense with probabilities. Doing a weighted average of classes in (0,n) or multi-label won't do as intended.

Given that fact, an EnsembleSelector for classification must receive the probability_predictions, the labels and a metric.

The labels are encoded and we use the same transformation for both models and the labels given here, hence we can guarantee they are in (0,n) or multi-label and correspond to the arrangement in probability_predictions.
However, turns out most sklearn metrics don't really accept probabilities in their signature, they require the probabilities to be converted to labels first. One exception of course is roc_auc_score which is the default we use. For example, check out sklearn accuracy, they rely on predictions and not probabilities

So I would expect that using any other metric other than roc_auc_score ends up with these NaN scores, or even worse, failing silently.

The way forward I see is to explicitly treat classification or regression separately. Before the score of adding the classifier to the ensemble, the probabilities need to be weighted summed and then inverse transformed to allow for all metric types.

Feb 17 '22 14:02 eddiebergman

Turns out this was not the issue and these probabilities are convert properly before hitting the metric. I will keep an eye on it.

Feb 17 '22 17:02 eddiebergman

auto-sklearn
auto-sklearn copied to clipboard

Multi-label Classification Scoring "NaN"

minimally prepare dataset

split into train and test sets

auto-sklearn auto-sklearn copied to clipboard

Multi-label Classification Scoring "NaN"

minimally prepare dataset

split into train and test sets

auto-sklearn
auto-sklearn copied to clipboard