xgboost-survival-embeddings
xgboost-survival-embeddings copied to clipboard
SHAP explanation for XGBSEKaplanTree or bootstrapestimator.
Hi, Is it possible to use SHAP with XGBSEKaplanTree or bootstrapestimator. SHAP treeexplainer is not working with them. Permutationexplainer seems to start evaluating but ended up with error "ValueError: max_evals=1785 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1799!"
I am not sure how to fix this error. If anyone can point me in the right direction, it will be really helpful. THank you in advance.
Hi @hellorp1990 . Could you provide a code example explaining how are you trying to use XGBSE with SHAP? Are you trying to use the whole survival curve as your target or have you transformed the predict function to output a single value response?
@davivieirab My model: xgbse_model = XGBSEKaplanTree(params) bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)
Shap model:
shap_values = shap.Explainer(bootstrap_estimator.predict, data,feature_names=feature_names,max_evals=2000) shaps = shap_values(data)
@davivieirab if i dont use the max_evals in the shap.explainer, it wont run at all. with max_evals=2000, the shap was running but it was showing 10hrs projected time to finish.
My database size was 330 rows and 900 columns and I was doing train-test split (25% for test).
@hellorp1990 , the output of XGBSEBootstrapEstimator is a multi-output regression problem, so for each sample you get a whole survival function with a probability of survival for each time bucket evaluated. Consequently, for each sample you will have an array of shap values (one value for each feature) for each time period.
Find a code example below - references: SHAP values for multi-output problems, using KernelSHAP with XGBoost:
import pandas as pd
import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator
xgbse_model = XGBSEKaplanTree(your_params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)
columns = X_train.columns
## kernel shap sends data as numpy array which has no column names, so we fix it
## source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46
def xgbse_predict(data_asarray):
data_asframe = pd.DataFrame(data_asarray, columns=columns)
return bootstrap_estimator.predict(data_asframe)
#### Kernel SHAP
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train.head(100))
# Explain a single instance - output: (1, n_time_buckets, n_features)
shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])
# Get explanations for the first time bucket
first_time_bucket_shap_values = pd.Series(shap_one[0])
# Print shap values for the first time bucket and the corresponding features
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))
You will get something like (for the first time bucket):
shap_value | feature |
---|---|
0.001919 | x0 |
0.006411 | x1 |
0.000411 | x2 |
0.002464 | x3 |
0.000239 | x4 |
0.000893 | x5 |
0.002441 | x6 |
0.000117 | x7 |
0.009901 | x8 |
As an action item we will add a notebook with a brief documentation on how to use SHAP with the XGBSE lib
hello, davivieirab, have you added documentaion for how to use SHAP with XGBSE? when I use my code to run in the way you mentioned above, it runs into error. The following is my code: from xgbse import XGBSEDebiasedBCE
fitting xgbse model
xgbse_model = XGBSEDebiasedBCE() xgbse_model.fit(X_train, y_train, time_bins=TIME_BINS)
predicting
y_pred = xgbse_model.predict(X_test)
import shap from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator
kernel shap sends data as numpy array which has no column names, so we fix it
source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100) def xgbse_predict(data_asarray): data_asframe = pd.DataFrame(data_asarray, columns=columns) return bootstrap_estimator.predict(data_asframe) columns = X_train.columns shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train)
Kernel SHAP
Explain a single instance - output: (1, n_time_buckets, n_features)
shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])
Get explanations for the first time bucket
first_time_bucket_shap_values = pd.Series(shap_one[0]) print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))
Error report: Provided model function fails when applied to the provided data set. 'XGBSEBootstrapEstimator' object has no attribute 'estimators_'