DALEX icon indicating copy to clipboard operation
DALEX copied to clipboard

Add support for Multiregression tasks

Open ThomasWolf0701 opened this issue 3 years ago • 3 comments

I tried the python version of dalex with a multiregression model and it gave an error. (See below) Is there any way around it ? If i understand correctly iBreakdown/pyBreakdown can deal with multiple classes for classification which are also probabilities organized in multiple columns/arrays so this should be quite similar. Would be great if this would be enabled. The SHAP package also supports Shap values for the multirgression case.

Can i call ibreakdown directly from dalex, without generating an explainer object ? The ibreakdown for Python has not been updated in a while but the new Python Dalex seems quite active.

decision tree for multioutput regression

import dalex as dx from sklearn.datasets import make_regression from sklearn.tree import DecisionTreeRegressor

create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

define model

model = DecisionTreeRegressor()

model.fit(X,y)

dx.Explainer(model,X,y)

data is converted to pd.DataFrame, columns are set as string numbers -> data : 1000 rows 10 cols Traceback (most recent call last):

File "", line 11, in dx.Explainer(model,X,y)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\dalex_explainer\object.py", line 131, in init y = check_y(y, data, verbose)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\dalex_explainer\checks.py", line 52, in check_y raise ValueError("y must have only one dimension")

ValueError: y must have only one dimension

ThomasWolf0701 avatar Nov 19 '20 17:11 ThomasWolf0701

We don't support multi-output models yet. You can adjust the predict_function to produce iBreakDown plots for a given class.

import dalex as dx
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

model = DecisionTreeRegressor()

model.fit(X,y)

exp_0 = dx.Explainer(model, X, y[:, 0], predict_function = lambda m, d: m.predict(d)[:, 0], label="output 0")
exp_1 = dx.Explainer(model, X, y[:, 1], predict_function = lambda m, d: m.predict(d)[:, 1], label="output 1")

exp_0.predict_parts(X[2, :]).plot(exp_1.predict_parts(X[2, :]))
y[2, :]

hbaniecki avatar Nov 19 '20 18:11 hbaniecki

Example https://dalex.drwhy.ai/python-dalex-multioutput added in https://github.com/ModelOriented/DALEX-docs/commit/47378806fa9d32b612fb84adb46640638956ede7.

hbaniecki avatar Apr 17 '22 14:04 hbaniecki

Hi @hbaniecki this would be nice

But it would important to consider the MultiOutput wrapper of scikit learn.

Currently I am creating explainers for every target and then adding them to the plots:

image

(every line represents a model)

edgBR avatar May 02 '22 15:05 edgBR