SHAP error
ValueError: Columns of dataframes passed to fit() and transform() differ:
- f
- e
- d
- c
- b
- a
- 0
- 1
- 2
- 3
- 4
- 5
Hi @wangqiankun0201
Thanks for reporting this issue. Could you please provide more details or code to reproduce the error? Also, it seems like the column mismatch might be resolved by reindexing the dataframe. Can you try that and see if it helps?
Something like: df_transform = df_transform.reindex(columns=df_fit.columns)
Thanks!
X_train.columns = ["0", "1", "2", "3", "4", "5"] X_test.columns = ["0", "1", "2", "3", "4", "5"] By adding the above code, the problem has been solved. I would also like to ask if it took me a long time to calculate SHAP, is it normal?
shap_values = interpretability.shap.parallel_permutation_shap( reg, X_test, n_jobs=-1 ) Why does the above code report an error: ValueError: Number of processes must be at least 1. I use google's colab
Could you show what X_test looks like? It seems that this error message can appear for weird datasets like empty or 1 row datasets (https://github.com/nalepae/pandarallel/issues/141).
| 0 | 1 | 2 | 3 | 4 | 5 |
|---|
1016 | 18.4 | 6 | 280 | 0.5 | 15 1422 | 18.4 | 6 | 120 | 0.5 | 10 1016 | 18.4 | 7 | 120 | 0.5 | 15 1016 | 18.4 | 2 | 120 | 0.5 | 15 1016 | 18.4 | 5 | 120 | 0.5 | 0 ... | ... | ... | ... | ... | ... 1016 | 26.4 | 6 | 120 | 0.5 | 15 1016 | 18.4 | 6 | 60 | 0.5 | 10 1016 | 18.4 | 6 | 120 | 0.4 | 0 1016 | 18.4 | 6 | 160 | 0.5 | 0 1016 | 18.4 | 6 | 180 | 0.5 | 5 My X_test look like this
hey @wangqiankun0201 , sorry for the delay in replying. Are you still facing this error with the latest version of TabPFN and its dependencies? If so, could you attach a minimal script which reproduces the error?
@Qikuu we are closing this issue for now, but feel free to reopen it if still relevant!
@oscarkey @noahho Hey guys, im getting that error too. Specifically im using the KernelExplainer method from shap lib. See me code for more details:
# ---- Normalize inputs to DataFrame with headers ----
X_bg_df = _as_dataframe(X_bg)
X_s_df = _as_dataframe(X_sample, ref=X_bg_df)
X_bg_df = _ensure_bg_sample(X_bg_df, max_rows=100, random_state=7)
# ---- Unwrap final estimator ----
est = _unwrap_estimator(clf)
est_name = name.lower()
is_tree = any(k in est_name for k in ["xgb", "forest", "gradientboosting", "histgradientboosting", "catboost"])
is_linear = any(k in est_name for k in ["logisticregression", "ridge", "lasso", "linearsvc"])
is_transformers = any(k in est_name for k in ["tabpfn", "tabtransformers"])
# ---- Use a masker to avoid deprecations (feature_perturbation) ----
masker = shap.maskers.Independent(X_bg_df)
try:
if is_tree:
# Prefer TreeExplainer for tree-based models
print("DEBUG MODELO: Modelo arvore")
explainer = shap.TreeExplainer(est, data=masker)
exp = explainer(X_s_df) # Explanation (new API)
elif is_linear:
# LinearExplainer for linear models; let SHAP handle link internally
print("DEBUG MODELO: Modelo linear")
explainer = shap.Explainer(est, masker=masker, algorithm="linear")
exp = explainer(X_s_df)
elif is_transformers:
print("DEBUG MODELO: Modelo transformers")
n_samples_bg = 50
X_bg_sub = X_bg_df.sample(n=n_samples_bg, random_state=13)
bg_array = X_bg_sub.values
# Explainer (classe positiva)
pred_fn, _ = _get_pred_fn_positive_proba(clf)
explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")
exp = explainer(X_bg_sub)
else:
# KernelExplainer as fallback (slower); explain positive class probability
print("DEBUG MODELO: Modelo a caralho")
bg_frac = 0.25
X_bg_sub = X_bg_df.sample(frac=bg_frac, random_state=13)
print(f"Tamanho do background data para o explainer: {X_bg_sub.shape[0]} amostras.")
bg_array = X_bg_sub.values # KernelExplainer recebe array-like
# Explainer (classe positiva)
pred_fn, _ = _get_pred_fn_positive_proba(clf)
explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")
# Explicar o conjunto de interesse
exp = explainer(X_bg_sub)
except Exception as e:
raise RuntimeError(f"[{name}] Failed to compute SHAP values for binary classifier: {e}") from e
hey @Felipecordeiiro , I'm now looking into this! I tried to reproduce the error using your code but wasn't able to. Would be able to share X_bg and X_sample, or at least their shapes? I'm guessing clf is a TabPFNClassifier, but could you also share the settings you're using? And the versions of the tabpfn and shap packages?
@Felipecordeiiro I'm closing this issue again for now, but feel free to reopen!