explainerdashboard
explainerdashboard copied to clipboard
ExplainerDashboard for multioutput regressors
Hello,
Is it possible to use the explainerdashboard for multioutput regressors? I have setup a regression model using the sklearn random forest regression to build models for multiple target variables. However, explainerdashboard regression explainer seems to only allow y argument with single variable/column and does not support a dataframe with multiple targets/outputs.
Alternately, I have tried to allow user to pick which output model they would wish to explain/visualize through the explainerdashboard, with the model being fit on that output and explainerdashboard built on the selected output. However, I am unable to allow user to re-enter their output pick and re-create the dashboard within the same run. Would there be a way to do this instead?
Appreciate your inputs.
Regards, Andy
Hi @andypatrac,
Honestly, I didn't even know this was a thing :)
Anyway, no currently not supported. Classifier can be multiclass, but regressors are assumed to be single output.
Have you tried getting shap values directly with the shap library? Does that work? If that is supported, in principle it could also be included in the dashboard. Though not sure how many modifications will have to be made to accomodate it...
(supporting multiclass was quite a lot of work, that affected basically every part of the library)
Hi @oegedijk ,
Thanks for the quick response. The shap library does support calculating multi-output shap values. In this case, the shap values is essentially a list of matrices of shap values for each output variable. I understand might be a lot of work for supporting this in explainerdashboard.
As an alternative/workaround, I though the user may select the single output they wish to view the dashboard for and we can program to perform regression for that particular output and display on the dashboard. This works for the first output selection from the user. Once the dashboard.run is called, the program does not seem to proceed further (at least in debug mode). It just stay stuck and does not move to the request for a different output the user might want to review. Is there a way to abort the dashboard server on que after the run command that will allow the program to request for the next output selection from the user and re-run the dashboard?
Appreciate your inputs.
Yeah, that's essentially what the dashboard does with a multiclass example:
from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_embarked
X_train, y_train, X_test, y_test = titanic_embarked()
model = RandomForestClassifier(n_estimators=5, max_depth=2).fit(X_train, y_train)
explainer = ClassifierExplainer(model, X_test, y_test,
labels=['Queenstown', 'Southampton', 'Cherbourg'])
ExplainerDashboard(explainer).run()
Really appreciate your inputs! This is nice, unfortunately it is currently only available with classifier explainers.
I was thinking along the same lines, but with a minor variation of the input selection being done from say a plotly dashboard (or in the terminal itself), which would open a new window with the explainerdashboard for the selected variable/class. The issue I run into is once the explainerdashboard is running, I have to manually kill it in the terminal using Ctrl+C. When it is manually killed, the program progressed to the next line of code requested user for next output class to review. I was hoping there was a better way to kill the dashboard server running instead of using Ctrl+C.
So I think ideally the solution would be the same as the multiclass where you can select the output from a dropdown and then the whole dashboard adjusts to that selection.
However, that will probably require quite a bit of work.
Do you have a nice simple example with a smallish dataset of a multi output regression and how you get the shap values? Then I can see how much plumbing would have to be done behind the scenes to support that fluidly in the dashboard...
Sure. I'll put together sample dataset of 20-30 rows for multi-output regression along with the snippet code for getting the shap values. In the meantime, I figured using the explainerhub can serve as a temporary workaround, it is not as elegant as the multiclass for classifier, but it works. The idea is to loop through and build a separate model and explainerdashboard for each of the output variables and assemble them into the hub.
Hi @oegedijk,
I have attached a sample dataset with 39 data lines, 5 inputs/feature variables (X1-X5) and 2 output variables (Y1-Y2). Listed below is the snippet of the code using the sklearn random forest regressor for building the multi-output model and the shap values for the same.
import pandas as pd from sklearn.ensemble import RandomForestRegressor import shap
#Read data from specified path df = pd.read_csv(filePath)
#Identify independents and dependents indLst = [] depLst = [] for cnt, ind in enumerate(df): if "X" in df.columns[cnt]: indLst.append(df.columns[cnt]) else: depLst.append(df.columns[cnt])
indVars = df[indLst] depVars = df[depLst]
#Fit model and generate shap values model = RandomForestRegressor() model.fit(indVars.values, depVars.values) expl_shap = shap.KernelExplainer(model.predict, indVars.values) shapVals = expl_shap.shap_values(indVars.values) print(shapVals) shap.summary_plot(shapVals,indVars)
Thank you, Andy
Hi, @oegedijk ! Thank you for providing such a wonderful code. Here I have a question. Can I save and run the dashboard in a specific time without training the model again? I'd like your help soon.
Yes, you can find more info here on how to store and load the dashboard to disk (without having to recalculate the shap values): https://explainerdashboard.readthedocs.io/en/latest/deployment.html#storing-explainer-and-running-default-dashboard-with-gunicorn