DeepTables icon indicating copy to clipboard operation
DeepTables copied to clipboard

SHAP values with DeepTables

Open gladomat opened this issue 4 years ago • 3 comments

I want to use SHAP values (https://github.com/slundberg/shap) to get feature importances. I thought of using the KernelExplainer. The problem that I encouter is that the embeddings of categorical variables are done on the fly. But I can only pass the non-embedded test set. How can I gain access to the embedded data? Is there a way?

gladomat avatar May 18 '20 15:05 gladomat

DT provides apply(...) to easily extract the outputs of a certain layer.

Related test cases: https://github.com/DataCanvasIO/deeptables/blob/0be75d22184a49e0201f7733a8cc854379bacbc5/tests/models/deeptable_test.py#L78  

jackguagua avatar May 19 '20 01:05 jackguagua

I've got the SHAP values working with DT. You need a helper function in order to get the feature names back into a dataframe. SHAP takes a pandas dataframe but gives back a numpy array. dt.predict() needs the feature names, otherwise it wouldn't know which ones are categorical and need to be embedded. Here's how to do it.

feature_names=X_train.columns.to_list()
def model_predict(data_asarray):
    data_asframe = pd.DataFrame(data_asarray, columns=feature_names)
    return dt.predict(data_asframe)

explainer = shap.KernelExplainer(model_predict, X_train.iloc[:50,:])
shap_values = explainer.shap_values(X.iloc[299,:], nsamples=500)
shap.decision_plot(explainer.expected_value, shap_values[0], features=X_train.iloc[:50,:], feature_names=feature_names)

gladomat avatar Jun 03 '20 13:06 gladomat

@gladomat Thank you for your contribution for DT with SHAP.

jackguagua avatar Sep 07 '20 06:09 jackguagua