mljar-supervised Error with the Ensemble method in 'Perform' mode

The fit method used:

from supervised.automl import AutoML

path = 'binary_classification_Perform_cv'

cv = [(train_indices, test_indices)]

automl = AutoML(mode='Perform', ml_task='binary_classification', results_path=path, total_time_limit=24*3600, explain_level=2, validation_strategy={"validation_type": "custom"})
automl.fit(X=features, y=labels, sample_weight=weights, cv=cv)

The output of the error.md file:

Input contains NaN, infinity or a value too large for dtype('float64').
Traceback (most recent call last):
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/supervised/base_automl.py", line 1089, in _fit
    is_stacked=params["is_stacked"]
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/supervised/base_automl.py", line 394, in ensemble_step
    self.ensemble.fit(oofs, target, sample_weight)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/supervised/ensemble.py", line 222, in fit
    score = self.metric(y, y_ens, sample_weight)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/supervised/utils/metric.py", line 408, in __call__
    return self.metric(y_true, y_predicted, sample_weight=sample_weight)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/supervised/utils/metric.py", line 25, in logloss
    ll = log_loss(y_true, y_predicted, sample_weight=sample_weight)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2229, in log_loss
    y_pred = check_array(y_pred, ensure_2d=False)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 721, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/eos/user/a/azaboren/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 106, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I can provide the data used in this example if it is needed to reproduce the issue.

Jun 02 '21 18:06 andrew-zaborenko

@andrew-zaborenko thank you for reporting the issue. Yes, please send the full code and data as well.

Jun 02 '21 18:06 pplonski

What would be the most convenient way for you to receive the data? I ran my code in a Jupyter Notebook and my dataset is around 110MB in .npy format.

Jun 02 '21 19:06 andrew-zaborenko

I've created a folder with all the used data and the notebook I've ran. Most of the code there is not needed to reproduce the issue, but I left it as is for completeness. It should be possible to reproduce the problem by executing the sixth cell and the ones after it. https://cernbox.cern.ch/index.php/s/9DavDMeDx0zMTTQ

Jun 03 '21 10:06 andrew-zaborenko

@andrew-zaborenko thank you for code and data. After first look, I should be able to run it.

I will look on it on Monday an let you know. BTW, at which experiment are you?

Jun 04 '21 09:06 pplonski

Hi, I've tried using the same method of custom validation on a different, smaller dataset and the same ensemble error occurred, so I don't think this issue is data-specific. I've looked through the docs a bit more and I think the cause is described here: https://supervised.mljar.com/features/algorithms/#stacked-algorithm. The stacked ensemble algorithm works only with k-fold cross-validation, and not with custom validation. Do you think it is possible to implement the ensemble method for custom validation? The ensemble usually works better than any single model. Anyway, I guess I should have read the documentation more extensively before opening an issue :)

I am working at the CMS Collaboration searching for possible deviations from the Standard Model predictions in top-quark processes. I'm just an undergrad student though :smile:

Jun 06 '21 20:06 andrew-zaborenko

Hi @andrew-zaborenko,

There can be problems with ensembling for custom validation, but not always, and it depends on the validation splits. The stacking ensemble is very fragile and sensitive for target leakages. Stacking should be used only with k-fold cross-validation (for safety).

In your situation, the stacking ensemble will not work. But classic (weighted) ensemble should work. So there must be some bug. Give me some time, I'm working on it right now.

Jun 07 '21 10:06 pplonski

@andrew-zaborenko I loaded your models and trained the ensemble ...

Ensemble logloss 0.497096 trained in 87.69 seconds

to do this, I needed to edit the progress.json and params.json files.

If you have an example of a bug with a smaller dataset, that will be faster for me to debug. I will try to run training from scratch now.

Jun 07 '21 10:06 pplonski

@andrew-zaborenko I was able to compute full training on your data:

Linear algorithm was disabled.
AutoML directory: download/AutoML_2
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Random Forest', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network']
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'ensemble']
Skip simple_algorithms because no parameters were generated.
* Step default_algorithms will try to check up to 5 models
Custom validation strategy
Split 0.
Train 137036 samples.
Validation 137446 samples.
1_Default_LightGBM logloss 0.50397 trained in 25.31 seconds (1-sample predict time 0.0075 seconds)
2_Default_Xgboost logloss 0.505556 trained in 26.08 seconds (1-sample predict time 0.0079 seconds)
3_Default_CatBoost logloss 0.502192 trained in 33.14 seconds (1-sample predict time 0.0092 seconds)
4_Default_NeuralNetwork logloss 0.547975 trained in 133.33 seconds (1-sample predict time 0.0066 seconds)
5_Default_RandomForest logloss 0.573356 trained in 37.22 seconds (1-sample predict time 0.0147 seconds)
* Step not_so_random will try to check up to 20 models
10_LightGBM logloss 0.505233 trained in 25.52 seconds (1-sample predict time 0.0076 seconds)
6_Xgboost logloss 0.504457 trained in 25.85 seconds (1-sample predict time 0.0084 seconds)
14_CatBoost logloss 0.500955 trained in 43.31 seconds (1-sample predict time 0.0095 seconds)
18_RandomForest logloss 0.564637 trained in 44.04 seconds (1-sample predict time 0.0381 seconds)
22_NeuralNetwork logloss 0.536425 trained in 71.01 seconds (1-sample predict time 0.0067 seconds)
11_LightGBM logloss 0.51027 trained in 22.52 seconds (1-sample predict time 0.0083 seconds)
7_Xgboost logloss 0.506532 trained in 25.58 seconds (1-sample predict time 0.0075 seconds)
15_CatBoost logloss 0.503125 trained in 31.77 seconds (1-sample predict time 0.0094 seconds)
19_RandomForest logloss 0.535652 trained in 55.93 seconds (1-sample predict time 0.021 seconds)
23_NeuralNetwork logloss 0.52832 trained in 119.87 seconds (1-sample predict time 0.007 seconds)
12_LightGBM logloss 0.504643 trained in 29.45 seconds (1-sample predict time 0.0211 seconds)
8_Xgboost logloss 0.504307 trained in 26.01 seconds (1-sample predict time 0.0076 seconds)
16_CatBoost logloss 0.502493 trained in 35.59 seconds (1-sample predict time 0.0094 seconds)
20_RandomForest logloss 0.586218 trained in 35.63 seconds (1-sample predict time 0.0214 seconds)
24_NeuralNetwork logloss 0.541083 trained in 189.98 seconds (1-sample predict time 0.0068 seconds)
13_LightGBM logloss 0.504782 trained in 25.42 seconds (1-sample predict time 0.0076 seconds)
9_Xgboost logloss 0.508033 trained in 25.08 seconds (1-sample predict time 0.0178 seconds)
17_CatBoost logloss 0.506843 trained in 27.3 seconds (1-sample predict time 0.0093 seconds)
21_RandomForest logloss 0.542927 trained in 72.01 seconds (1-sample predict time 0.2759 seconds)
25_NeuralNetwork logloss 0.532657 trained in 140.09 seconds (1-sample predict time 0.0067 seconds)
* Step golden_features will try to check up to 3 models
None 10
Add Golden Feature: feature_38_diff_feature_43
Add Golden Feature: feature_39_diff_feature_43
Add Golden Feature: feature_33_diff_feature_39
Add Golden Feature: feature_15_diff_feature_38
Add Golden Feature: feature_28_sum_feature_19
Add Golden Feature: feature_43_sum_feature_24
Add Golden Feature: feature_42_diff_feature_43
Add Golden Feature: feature_19_diff_feature_33
Add Golden Feature: feature_28_diff_feature_43
Add Golden Feature: feature_44_sum_feature_19
Created 10 Golden Features in 3.28 seconds.
14_CatBoost_GoldenFeatures logloss 0.500985 trained in 51.26 seconds (1-sample predict time 0.0131 seconds)
3_Default_CatBoost_GoldenFeatures logloss 0.502583 trained in 32.62 seconds (1-sample predict time 0.0136 seconds)
16_CatBoost_GoldenFeatures logloss 0.502545 trained in 34.13 seconds (1-sample predict time 0.0138 seconds)
* Step insert_random_feature will try to check up to 1 model
14_CatBoost_RandomFeature logloss 0.501652 trained in 47.37 seconds (1-sample predict time 0.0101 seconds)
Drop features ['feature_25', 'feature_8', 'feature_14', 'feature_24', 'feature_32', 'feature_6', 'feature_16', 'feature_36', 'feature_44', 'random_feature', 'feature_21']
* Step features_selection will try to check up to 5 models
14_CatBoost_SelectedFeatures logloss 0.501171 trained in 44.77 seconds (1-sample predict time 0.0095 seconds)
1_Default_LightGBM_SelectedFeatures logloss 0.506758 trained in 24.03 seconds (1-sample predict time 0.0075 seconds)
8_Xgboost_SelectedFeatures logloss 0.504634 trained in 24.85 seconds (1-sample predict time 0.0075 seconds)
23_NeuralNetwork_SelectedFeatures logloss 0.524082 trained in 72.12 seconds (1-sample predict time 0.0065 seconds)
19_RandomForest_SelectedFeatures logloss 0.536358 trained in 53.63 seconds (1-sample predict time 0.043 seconds)
* Step hill_climbing_1 will try to check up to 14 models
26_CatBoost logloss 0.500334 trained in 63.61 seconds (1-sample predict time 0.0097 seconds)
27_CatBoost logloss 0.504142 trained in 33.76 seconds (1-sample predict time 0.0094 seconds)
28_CatBoost_GoldenFeatures logloss 0.500188 trained in 71.3 seconds (1-sample predict time 0.0132 seconds)
29_CatBoost_GoldenFeatures logloss 0.503203 trained in 33.5 seconds (1-sample predict time 0.0132 seconds)
30_LightGBM logloss 0.50397 trained in 26.11 seconds (1-sample predict time 0.0078 seconds)
31_LightGBM logloss 0.50397 trained in 26.04 seconds (1-sample predict time 0.0076 seconds)
32_Xgboost logloss 0.504952 trained in 26.52 seconds (1-sample predict time 0.0084 seconds)
33_Xgboost logloss 0.504668 trained in 25.7 seconds (1-sample predict time 0.0283 seconds)
34_Xgboost logloss 0.50466 trained in 27.18 seconds (1-sample predict time 0.0079 seconds)
35_LightGBM logloss 0.504643 trained in 25.56 seconds (1-sample predict time 0.0076 seconds)
36_NeuralNetwork_SelectedFeatures logloss 0.519513 trained in 62.5 seconds (1-sample predict time 0.0063 seconds)
37_NeuralNetwork logloss 0.532102 trained in 79.27 seconds (1-sample predict time 0.0068 seconds)
38_RandomForest logloss 0.534768 trained in 63.76 seconds (1-sample predict time 0.0215 seconds)
39_RandomForest_SelectedFeatures logloss 0.535528 trained in 83.57 seconds (1-sample predict time 0.0619 seconds)
* Step hill_climbing_2 will try to check up to 18 models
40_CatBoost_GoldenFeatures logloss 0.499759 trained in 69.78 seconds (1-sample predict time 0.0136 seconds)
41_CatBoost_GoldenFeatures logloss 0.500986 trained in 70.19 seconds (1-sample predict time 0.0134 seconds)
42_CatBoost logloss 0.499971 trained in 65.87 seconds (1-sample predict time 0.0094 seconds)
43_CatBoost logloss 0.500841 trained in 66.31 seconds (1-sample predict time 0.0096 seconds)
44_LightGBM logloss 0.506009 trained in 24.3 seconds (1-sample predict time 0.0092 seconds)
45_LightGBM logloss 0.506009 trained in 24.38 seconds (1-sample predict time 0.0076 seconds)
46_Xgboost logloss 0.50451 trained in 26.24 seconds (1-sample predict time 0.013 seconds)
47_Xgboost logloss 0.504931 trained in 25.78 seconds (1-sample predict time 0.0078 seconds)
48_Xgboost logloss 0.50467 trained in 26.27 seconds (1-sample predict time 0.015 seconds)
49_Xgboost logloss 0.505664 trained in 27.15 seconds (1-sample predict time 0.0076 seconds)
50_NeuralNetwork_SelectedFeatures logloss 0.519465 trained in 77.61 seconds (1-sample predict time 0.0065 seconds)
51_NeuralNetwork_SelectedFeatures logloss 0.525571 trained in 73.46 seconds (1-sample predict time 0.0068 seconds)
52_NeuralNetwork_SelectedFeatures logloss 0.529746 trained in 72.51 seconds (1-sample predict time 0.0064 seconds)
53_NeuralNetwork_SelectedFeatures logloss 0.519692 trained in 108.88 seconds (1-sample predict time 0.0066 seconds)
54_RandomForest logloss 0.535093 trained in 66.09 seconds (1-sample predict time 0.038 seconds)
55_RandomForest logloss 0.535288 trained in 49.17 seconds (1-sample predict time 0.0319 seconds)
56_RandomForest_SelectedFeatures logloss 0.535469 trained in 54.35 seconds (1-sample predict time 0.0286 seconds)
57_RandomForest_SelectedFeatures logloss 0.536083 trained in 63.79 seconds (1-sample predict time 0.0347 seconds)
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.49663 trained in 110.36 seconds (1-sample predict time 0.1547 seconds)
AutoML fit time: 3588.5 seconds
AutoML best model: Ensemble

The final ensemble:

{
    "name": "Ensemble",
    "ml_task": "binary_classification",
    "optimize_metric": "logloss",
    "selected_models": [
        {
            "model": "14_CatBoost",
            "repeat": 2.0
        },
        {
            "model": "14_CatBoost_GoldenFeatures",
            "repeat": 4.0
        },
        {
            "model": "14_CatBoost_SelectedFeatures",
            "repeat": 8.0
        },
        {
            "model": "16_CatBoost",
            "repeat": 2.0
        },
        {
            "model": "16_CatBoost_GoldenFeatures",
            "repeat": 2.0
        },
        {
            "model": "34_Xgboost",
            "repeat": 1.0
        },
        {
            "model": "36_NeuralNetwork_SelectedFeatures",
            "repeat": 4.0
        },
        {
            "model": "3_Default_CatBoost",
            "repeat": 3.0
        },
        {
            "model": "40_CatBoost_GoldenFeatures",
            "repeat": 9.0
        },
        {
            "model": "42_CatBoost",
            "repeat": 3.0
        },
        {
            "model": "49_Xgboost",
            "repeat": 2.0
        },
        {
            "model": "50_NeuralNetwork_SelectedFeatures",
            "repeat": 2.0
        },
        {
            "model": "51_NeuralNetwork_SelectedFeatures",
            "repeat": 4.0
        },
        {
            "model": "53_NeuralNetwork_SelectedFeatures",
            "repeat": 5.0
        },
        {
            "model": "6_Xgboost",
            "repeat": 4.0
        },
        {
            "model": "8_Xgboost",
            "repeat": 1.0
        }
    ],
    "predictions_fname": "Ensemble/predictions_ensemble.csv",
    "metric_name": "logloss",
    "final_loss": 0.4966304943637485,
    "train_time": 110.355633020401,
    "is_stacked": false,
    "threshold": 0.5213613975344044
}

I was using code that you provided and all parameters of training was unchanged except the explain_level=0, to speed up the training.

What python are you using? Is it 3.7, 3.8 or 3.9? What operating system are you using? I was testing it on Ubuntu 20.04 with python 3.7

Jun 07 '21 11:06 pplonski

Hi, I was using python 3.7.6 on CentOS 7. I'll look into the dependencies, maybe I was using an older version of one of the needed packages. What parameters did you change in progress.json and params.json files?

Jun 08 '21 18:06 andrew-zaborenko

The changes in file params.json:

98c98
<     "fit_level": "finished",
---
>     "fit_level": "hill_climbing_2",

changes in progress.json:

2c2
<     "fit_level": "finished",
---
>     "fit_level": "hill_climbing_2",
3213,3222d3212
<         ],
<         "ensemble": [
<             {
<                 "model_type": "ensemble",
<                 "is_stacked": false,
<                 "name": "Ensemble",
<                 "status": "error",
<                 "final_loss": null,
<                 "train_time": null
<             }

after changes please increase the total_time_limit (for example multiply by 10) and just run AutoML with the same parameters, the Ensemble should be trained ...

Jun 11 '21 06:06 pplonski

mljar-supervised mljar-supervised copied to clipboard

Error with the Ensemble method in 'Perform' mode

mljar-supervised
mljar-supervised copied to clipboard