TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

SHAP error

Open wangqiankun0201 opened this issue 8 months ago • 9 comments

ValueError: Columns of dataframes passed to fit() and transform() differ:

  • f
  • e
  • d
  • c
  • b
  • a
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5

wangqiankun0201 avatar Apr 05 '25 04:04 wangqiankun0201

Hi @wangqiankun0201

Thanks for reporting this issue. Could you please provide more details or code to reproduce the error? Also, it seems like the column mismatch might be resolved by reindexing the dataframe. Can you try that and see if it helps?

Something like: df_transform = df_transform.reindex(columns=df_fit.columns)

Thanks!

anuragg1209 avatar Apr 05 '25 13:04 anuragg1209

X_train.columns = ["0", "1", "2", "3", "4", "5"] X_test.columns = ["0", "1", "2", "3", "4", "5"] By adding the above code, the problem has been solved. I would also like to ask if it took me a long time to calculate SHAP, is it normal?

wangqiankun0201 avatar Apr 06 '25 03:04 wangqiankun0201

shap_values = interpretability.shap.parallel_permutation_shap( reg, X_test, n_jobs=-1 ) Why does the above code report an error: ValueError: Number of processes must be at least 1. I use google's colab

wangqiankun0201 avatar Apr 06 '25 04:04 wangqiankun0201

Could you show what X_test looks like? It seems that this error message can appear for weird datasets like empty or 1 row datasets (https://github.com/nalepae/pandarallel/issues/141).

LeoGrin avatar Apr 07 '25 08:04 LeoGrin

0 1 2 3 4 5

1016 | 18.4 | 6 | 280 | 0.5 | 15 1422 | 18.4 | 6 | 120 | 0.5 | 10 1016 | 18.4 | 7 | 120 | 0.5 | 15 1016 | 18.4 | 2 | 120 | 0.5 | 15 1016 | 18.4 | 5 | 120 | 0.5 | 0 ... | ... | ... | ... | ... | ... 1016 | 26.4 | 6 | 120 | 0.5 | 15 1016 | 18.4 | 6 | 60 | 0.5 | 10 1016 | 18.4 | 6 | 120 | 0.4 | 0 1016 | 18.4 | 6 | 160 | 0.5 | 0 1016 | 18.4 | 6 | 180 | 0.5 | 5 My X_test look like this

wangqiankun0201 avatar Apr 08 '25 05:04 wangqiankun0201

hey @wangqiankun0201 , sorry for the delay in replying. Are you still facing this error with the latest version of TabPFN and its dependencies? If so, could you attach a minimal script which reproduces the error?

oscarkey avatar Aug 15 '25 09:08 oscarkey

@Qikuu we are closing this issue for now, but feel free to reopen it if still relevant!

noahho avatar Sep 03 '25 17:09 noahho

@oscarkey @noahho Hey guys, im getting that error too. Specifically im using the KernelExplainer method from shap lib. See me code for more details:

# ---- Normalize inputs to DataFrame with headers ----
    X_bg_df = _as_dataframe(X_bg)
    X_s_df  = _as_dataframe(X_sample, ref=X_bg_df)
    X_bg_df = _ensure_bg_sample(X_bg_df, max_rows=100, random_state=7)

    # ---- Unwrap final estimator ----
    est = _unwrap_estimator(clf)
    est_name = name.lower()
    is_tree   = any(k in est_name for k in ["xgb", "forest", "gradientboosting", "histgradientboosting", "catboost"])
    is_linear = any(k in est_name for k in ["logisticregression", "ridge", "lasso", "linearsvc"])
    is_transformers = any(k in est_name for k in ["tabpfn", "tabtransformers"])
# ---- Use a masker to avoid deprecations (feature_perturbation) ----
    masker = shap.maskers.Independent(X_bg_df)

    try:
        if is_tree:
            # Prefer TreeExplainer for tree-based models
            print("DEBUG MODELO: Modelo arvore")
            explainer = shap.TreeExplainer(est, data=masker)
            exp = explainer(X_s_df)             # Explanation (new API)

        elif is_linear:
            # LinearExplainer for linear models; let SHAP handle link internally
            print("DEBUG MODELO: Modelo linear")
            explainer = shap.Explainer(est, masker=masker, algorithm="linear")
            exp = explainer(X_s_df)
        elif is_transformers:
            print("DEBUG MODELO: Modelo transformers")
            n_samples_bg = 50
            X_bg_sub = X_bg_df.sample(n=n_samples_bg, random_state=13)
            bg_array = X_bg_sub.values

            # Explainer (classe positiva)
            pred_fn, _ = _get_pred_fn_positive_proba(clf)

            explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")
            exp = explainer(X_bg_sub)
        else:
            # KernelExplainer as fallback (slower); explain positive class probability
            print("DEBUG MODELO: Modelo a caralho")
            bg_frac = 0.25
            X_bg_sub = X_bg_df.sample(frac=bg_frac, random_state=13)
            
            print(f"Tamanho do background data para o explainer: {X_bg_sub.shape[0]} amostras.")
            bg_array = X_bg_sub.values  # KernelExplainer recebe array-like

            # Explainer (classe positiva)
            pred_fn, _ = _get_pred_fn_positive_proba(clf)
            explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")

            # Explicar o conjunto de interesse
            exp = explainer(X_bg_sub)

    except Exception as e:
        raise RuntimeError(f"[{name}] Failed to compute SHAP values for binary classifier: {e}") from e

Felipecordeiiro avatar Sep 17 '25 00:09 Felipecordeiiro

hey @Felipecordeiiro , I'm now looking into this! I tried to reproduce the error using your code but wasn't able to. Would be able to share X_bg and X_sample, or at least their shapes? I'm guessing clf is a TabPFNClassifier, but could you also share the settings you're using? And the versions of the tabpfn and shap packages?

oscarkey avatar Oct 15 '25 13:10 oscarkey

@Felipecordeiiro I'm closing this issue again for now, but feel free to reopen!

oscarkey avatar Nov 13 '25 11:11 oscarkey