mljar-supervised How to select models for more SHAP plots?

Hello MLJAR Team! I followed the attached tutorial, and my question is how to use a specific model for predictions and more detailed Shapley Values? After completing the following tutorial:

import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML
#> IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

import supervised
supervised.__version__
#> '1.0.2'

data = datasets.load_iris()
X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target").map({i:v for i, v in enumerate(data["target_names"])})

# Use 70% for training
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)

automl = AutoML(total_time_limit=5*60)
automl.fit(X_train, y_train)
#> AutoML directory: AutoML_2
#> The task is multiclass_classification with evaluation metric logloss
#> AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
#> AutoML will ensemble available models
#> AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
#> * Step simple_algorithms will try to check up to 3 models
#> 1_Baseline logloss 1.098612 trained in 0.29 seconds
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/supervised/utils/shap.py:116: UserWarning: The figure layout has changed to tight
#> DecisionTreeAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
#> Problem during computing permutation importance. Skipping ...
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> 2_DecisionTree logloss 0.013075 trained in 4.52 seconds
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/shap/plots/_beeswarm.py:925: UserWarning: The figure layout has changed to tight
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/supervised/utils/shap.py:116: UserWarning: The figure layout has changed to tight
#> LinearAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
#> Problem during computing permutation importance. Skipping ...
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> 3_Linear logloss 0.163424 trained in 5.84 seconds
#> * Step default_algorithms will try to check up to 3 models
#> XgbAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
#> Problem during computing permutation importance. Skipping ...
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/shap/plots/_beeswarm.py:925: UserWarning: The figure layout has changed to tight
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/supervised/utils/shap.py:116: UserWarning: The figure layout has changed to tight
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
#> 4_Default_Xgboost logloss 0.010908 trained in 5.33 seconds
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> MLPAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
#> Problem during computing permutation importance. Skipping ...
#> 5_Default_NeuralNetwork logloss 0.263295 trained in 0.33 seconds
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/supervised/utils/shap.py:116: UserWarning: The figure layout has changed to tight
#> RandomForestAlgorithm should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
#> Problem during computing permutation importance. Skipping ...
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> 6_Default_RandomForest logloss 0.027566 trained in 4.44 seconds
#> * Step ensemble will try to check up to 1 model
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> /Users/michaelmazzucco/Desktop/stiffness_ml/venv/lib/python3.10/site-packages/sklearn/metrics/_classification.py:2916: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
#> Ensemble logloss 0.010908 trained in 0.35 seconds
#> AutoML fit time: 29.33 seconds
#> AutoML best model: 4_Default_Xgboost
#> AutoML(total_time_limit=300)

# Predict
y_predicted = automl.predict(X_test)

result = pd.DataFrame({"Predicted": y_predicted, "Target": np.array(y_test)})
filtro = result.Predicted == result.Target
print(filtro.value_counts(normalize=True))
#> True     0.955556
#> False    0.044444
#> Name: proportion, dtype: float64

How could I select any model for further use? Be it another XGBoost, or even a Neural Net how can I directly select that model to generate something like:

import xgboost
import shap
#> Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)

# train XGBoost model
X,y = shap.datasets.adult()
model = xgboost.XGBClassifier().fit(X, y)

# compute SHAP values
explainer = shap.Explainer(model, X)
shap_values = explainer(X)
#> 
#>  16%|===                 | 5186/32561 [00:11<00:58]       
#> 
#>  17%|===                 | 5675/32561 [00:12<00:56]       
#> 
#>  19%|====                | 6180/32561 [00:13<00:55]       
#> 
#>  21%|====                | 6681/32561 [00:14<00:54]       
#> 
#>  22%|====                | 7181/32561 [00:15<00:53]       
#> 
#>  24%|=====               | 7692/32561 [00:16<00:51]       
#> 
#>  25%|=====               | 8197/32561 [00:17<00:50]       
#> 
#>  27%|=====               | 8700/32561 [00:18<00:49]       
#> 
#>  28%|======              | 9205/32561 [00:19<00:48]       
#> 
#>  30%|======              | 9717/32561 [00:20<00:47]       
#> 
#>  31%|======              | 10225/32561 [00:21<00:45]       
#> 
#>  33%|=======             | 10729/32561 [00:22<00:44]       
#> 
#>  34%|=======             | 11225/32561 [00:23<00:43]       
#> 
#>  36%|=======             | 11724/32561 [00:24<00:42]       
#> 
#>  38%|========            | 12228/32561 [00:25<00:41]       
#> 
#>  39%|========            | 12744/32561 [00:26<00:40]       
#> 
#>  41%|========            | 13253/32561 [00:27<00:39]       
#> 
#>  42%|========            | 13751/32561 [00:28<00:38]       
#> 
#>  44%|=========           | 14259/32561 [00:29<00:37]       
#> 
#>  45%|=========           | 14759/32561 [00:30<00:36]       
#> 
#>  47%|=========           | 15268/32561 [00:31<00:35]       
#> 
#>  48%|==========          | 15778/32561 [00:32<00:34]       
#> 
#>  50%|==========          | 16279/32561 [00:33<00:33]       
#> 
#>  52%|==========          | 16782/32561 [00:34<00:31]       
#> 
#>  53%|===========         | 17291/32561 [00:35<00:30]       
#> 
#>  55%|===========         | 17800/32561 [00:36<00:29]       
#> 
#>  56%|===========         | 18305/32561 [00:37<00:28]       
#> 
#>  58%|============        | 18804/32561 [00:38<00:27]       
#> 
#>  59%|============        | 19308/32561 [00:39<00:26]       
#> 
#>  61%|============        | 19811/32561 [00:40<00:25]       
#> 
#>  62%|============        | 20314/32561 [00:41<00:24]       
#> 
#>  64%|=============       | 20818/32561 [00:42<00:23]       
#> 
#>  65%|=============       | 21323/32561 [00:43<00:22]       
#> 
#>  67%|=============       | 21825/32561 [00:44<00:21]       
#> 
#>  69%|==============      | 22338/32561 [00:45<00:20]       
#> 
#>  70%|==============      | 22853/32561 [00:46<00:19]       
#> 
#>  72%|==============      | 23352/32561 [00:47<00:18]       
#> 
#>  73%|===============     | 23850/32561 [00:48<00:17]       
#> 
#>  75%|===============     | 24346/32561 [00:49<00:16]       
#> 
#>  76%|===============     | 24862/32561 [00:50<00:15]       
#> 
#>  78%|================    | 25365/32561 [00:51<00:14]       
#> 
#>  79%|================    | 25863/32561 [00:52<00:13]       
#> 
#>  81%|================    | 26369/32561 [00:53<00:12]       
#> 
#>  83%|=================   | 26868/32561 [00:54<00:11]       
#> 
#>  84%|=================   | 27365/32561 [00:55<00:10]       
#> 
#>  86%|=================   | 27873/32561 [00:56<00:09]       
#> 
#>  87%|=================   | 28375/32561 [00:57<00:08]       
#> 
#>  89%|==================  | 28877/32561 [00:58<00:07]       
#> 
#>  90%|==================  | 29380/32561 [00:59<00:06]       
#> 
#>  92%|==================  | 29868/32561 [01:00<00:05]       
#> 
#>  93%|=================== | 30364/32561 [01:01<00:04]       
#> 
#>  95%|=================== | 30867/32561 [01:02<00:03]       
#> 
#>  96%|=================== | 31370/32561 [01:03<00:02]       
#> 
#>  98%|===================| 31873/32561 [01:04<00:01]       
#> 
#>  99%|===================| 32381/32561 [01:05<00:00]       

shap.plots.waterfall(shap_values[0])

Any direction is much appreciated!

Jul 28 '23 23:07 michael-mazzucco

Hi @michael-mazzucco,

I think the easiest way would be to extract model parameters and try to manually build single model. Sorry!

There is a issue to add option to extract model from AutoML and create explanations for predictions, but it is not implemented.

Aug 01 '23 09:08 pplonski

Hi @pplonski ! I see. Do you have any suggestions on how to do this? is it using the JSON file? Will keep an eye out for that update.

Aug 01 '23 12:08 michael-mazzucco

Try to train the model using hyper parameters from README.md file that is in model directory or you can check hyperparameters from JSON file as well.

It is important to use the same preprocessing as well.

Aug 01 '23 13:08 pplonski

Will this problem be solved in next update? @pplonski Explanation is so important.

Oct 18 '23 02:10 williamty

I would love to add it @williamty - last time I was trying to add it, there was some problem with shap package and data preprocessing.

Right now I dont have time to implement this, lets wait for PR or for intern (from time to time we have interns in the company and they help with our open-source a lot!).

Oct 18 '23 07:10 pplonski

mljar-supervised mljar-supervised copied to clipboard

How to select models for more SHAP plots?

mljar-supervised
mljar-supervised copied to clipboard