deep_learning_for_tabular_data icon indicating copy to clipboard operation
deep_learning_for_tabular_data copied to clipboard

Feature importance

Open gladomat opened this issue 5 years ago • 3 comments

How would you go about finding the feature importance for the DNN model?

gladomat avatar May 27 '20 15:05 gladomat

Both LIME (https://github.com/marcotcr/lime) and SHAP (https://github.com/slundberg/shap) can provide you with feature importance of a DNN model. They require some extra-work, though. I will write some articles on that in the near future.

lmassaron avatar May 27 '20 16:05 lmassaron

Thanks for your answer. I tried SHAP but kept running into trouble with the inputs, and unfortunately I haven't been abel to figure it out. I would really appreciate a little tutorial on it.

gladomat avatar May 28 '20 08:05 gladomat

I've found the following solution. SHAP loses the feature names and tb.transform needs them to differentiate categorical from numerical features:

feature_names=X_train.columns.to_list()
def model_predict(data_asarray):
    data_asframe =  pd.DataFrame(data_asarray, columns=feature_names)
    x = tb.transform(data_asframe)
    return model.predict(x)

Then you can use SHAP to get the values.

# use Kernel SHAP to explain test set predictions
explainer = shap.KernelExplainer(model_predict, X_train.iloc[:50,:])
shap_values = explainer.shap_values(X.iloc[299,:], nsamples=500)
shap.decision_plot(explainer.expected_value, shap_values[0], features=X_train.iloc[:50,:], feature_names=feature_names)

I found the solution on stack exchange but I don't remember the link, so that's the only credit I can give.

gladomat avatar Jun 02 '20 12:06 gladomat