DeepTables
DeepTables copied to clipboard
SHAP values with DeepTables
I want to use SHAP values (https://github.com/slundberg/shap) to get feature importances. I thought of using the KernelExplainer. The problem that I encouter is that the embeddings of categorical variables are done on the fly. But I can only pass the non-embedded test set. How can I gain access to the embedded data? Is there a way?
DT provides apply(...)
to easily extract the outputs of a certain layer.
Related test cases: https://github.com/DataCanvasIO/deeptables/blob/0be75d22184a49e0201f7733a8cc854379bacbc5/tests/models/deeptable_test.py#L78
I've got the SHAP values working with DT. You need a helper function in order to get the feature names back into a dataframe. SHAP takes a pandas dataframe but gives back a numpy array. dt.predict()
needs the feature names, otherwise it wouldn't know which ones are categorical and need to be embedded. Here's how to do it.
feature_names=X_train.columns.to_list()
def model_predict(data_asarray):
data_asframe = pd.DataFrame(data_asarray, columns=feature_names)
return dt.predict(data_asframe)
explainer = shap.KernelExplainer(model_predict, X_train.iloc[:50,:])
shap_values = explainer.shap_values(X.iloc[299,:], nsamples=500)
shap.decision_plot(explainer.expected_value, shap_values[0], features=X_train.iloc[:50,:], feature_names=feature_names)
@gladomat Thank you for your contribution for DT with SHAP.