mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

No Shap outputs

Open dbrami opened this issue 2 years ago • 9 comments

Hi, I'm not seeing any shap outputs when using the following:

# Initialize AutoML in Explain Mode
automl = AutoML(mode="Explain", 
                explain_level=2,
               ml_task='multiclass_classification')
automl.fit(X, y)

This in spte of shap being properly installed. What I get out of the above code is the following:

AutoML directory: AutoML_7
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline logloss 3.229533 trained in 25.56 seconds
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
2_DecisionTree logloss 2.15877 trained in 59.34 seconds
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
3_Linear logloss 1.707406 trained in 47.68 seconds
* Step default_algorithms will try to check up to 2 models
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
4_Default_NeuralNetwork logloss 4.045366 trained in 7.02 seconds
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
5_Default_RandomForest logloss 1.858415 trained in 75.39 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 1.288517 trained in 0.56 seconds
AutoML fit time: 226.47 seconds
AutoML best model: Ensemble
AutoML(explain_level=2, ml_task='multiclass_classification')

dbrami avatar Dec 06 '22 06:12 dbrami

Thanks @dbrami for reporting. Is it possible to include data to reproduce the issue?

pplonski avatar Dec 06 '22 08:12 pplonski

Sure. Uploading my ipynb and data Archive.zip

dbrami avatar Dec 06 '22 21:12 dbrami

Hi Pavel, Any luck?

dbrami avatar Dec 12 '22 22:12 dbrami

It happened to me too @dbrami but with tree visualizations, with the same explain_level value, let me know if you find something

jasperan avatar May 30 '23 21:05 jasperan

it happened to me too. also with tree visualizations. i think maybe it's related to the mission, tree visualizations are not suitable for binary classification.

williamty avatar Oct 18 '23 02:10 williamty

Hi @williamty, please make sure that you have the latest version of package pip install -U mljar-supervised, decision trees should be produced. Regarding missing SHAP plots - it might be a bug.

pplonski avatar Oct 18 '23 07:10 pplonski

I have the same issue with the latest version - that no SHAP values are produced. Is there a previous stable version w.r.t. this feature?

csetzkorn avatar Oct 27 '23 12:10 csetzkorn

Maybe there were some changes in shap API?

pplonski avatar Oct 28 '23 11:10 pplonski

I think the issue is that the current implementation does not accept object/category/string types and everything needs to numeric fro SHAP to work, which kind of defeats the objective of AutoML should one use SHAP to guide feature selection ...

csetzkorn avatar Nov 14 '23 10:11 csetzkorn