TabPFN SHAP error

ValueError: Columns of dataframes passed to fit() and transform() differ:

f
e
d
c
b
a

0
1
2
3
4
5

Apr 05 '25 04:04 wangqiankun0201

Hi @wangqiankun0201

Thanks for reporting this issue. Could you please provide more details or code to reproduce the error? Also, it seems like the column mismatch might be resolved by reindexing the dataframe. Can you try that and see if it helps?

Something like: df_transform = df_transform.reindex(columns=df_fit.columns)

Thanks!

Apr 05 '25 13:04 anuragg1209

X_train.columns = ["0", "1", "2", "3", "4", "5"] X_test.columns = ["0", "1", "2", "3", "4", "5"] By adding the above code, the problem has been solved. I would also like to ask if it took me a long time to calculate SHAP, is it normal？

Apr 06 '25 03:04 wangqiankun0201

shap_values = interpretability.shap.parallel_permutation_shap( reg, X_test, n_jobs=-1 ) Why does the above code report an error: ValueError: Number of processes must be at least 1. I use google's colab

Apr 06 '25 04:04 wangqiankun0201

Could you show what X_test looks like? It seems that this error message can appear for weird datasets like empty or 1 row datasets (https://github.com/nalepae/pandarallel/issues/141).

Apr 07 '25 08:04 LeoGrin

0 1 2 3 4 5

1016 | 18.4 | 6 | 280 | 0.5 | 15 1422 | 18.4 | 6 | 120 | 0.5 | 10 1016 | 18.4 | 7 | 120 | 0.5 | 15 1016 | 18.4 | 2 | 120 | 0.5 | 15 1016 | 18.4 | 5 | 120 | 0.5 | 0 ... | ... | ... | ... | ... | ... 1016 | 26.4 | 6 | 120 | 0.5 | 15 1016 | 18.4 | 6 | 60 | 0.5 | 10 1016 | 18.4 | 6 | 120 | 0.4 | 0 1016 | 18.4 | 6 | 160 | 0.5 | 0 1016 | 18.4 | 6 | 180 | 0.5 | 5 My X_test look like this

Apr 08 '25 05:04 wangqiankun0201

hey @wangqiankun0201 , sorry for the delay in replying. Are you still facing this error with the latest version of TabPFN and its dependencies? If so, could you attach a minimal script which reproduces the error?

Aug 15 '25 09:08 oscarkey

@Qikuu we are closing this issue for now, but feel free to reopen it if still relevant!

Sep 03 '25 17:09 noahho

@oscarkey @noahho Hey guys, im getting that error too. Specifically im using the KernelExplainer method from shap lib. See me code for more details:

# ---- Normalize inputs to DataFrame with headers ----
    X_bg_df = _as_dataframe(X_bg)
    X_s_df  = _as_dataframe(X_sample, ref=X_bg_df)
    X_bg_df = _ensure_bg_sample(X_bg_df, max_rows=100, random_state=7)

    # ---- Unwrap final estimator ----
    est = _unwrap_estimator(clf)
    est_name = name.lower()
    is_tree   = any(k in est_name for k in ["xgb", "forest", "gradientboosting", "histgradientboosting", "catboost"])
    is_linear = any(k in est_name for k in ["logisticregression", "ridge", "lasso", "linearsvc"])
    is_transformers = any(k in est_name for k in ["tabpfn", "tabtransformers"])
# ---- Use a masker to avoid deprecations (feature_perturbation) ----
    masker = shap.maskers.Independent(X_bg_df)

    try:
        if is_tree:
            # Prefer TreeExplainer for tree-based models
            print("DEBUG MODELO: Modelo arvore")
            explainer = shap.TreeExplainer(est, data=masker)
            exp = explainer(X_s_df)             # Explanation (new API)

        elif is_linear:
            # LinearExplainer for linear models; let SHAP handle link internally
            print("DEBUG MODELO: Modelo linear")
            explainer = shap.Explainer(est, masker=masker, algorithm="linear")
            exp = explainer(X_s_df)
        elif is_transformers:
            print("DEBUG MODELO: Modelo transformers")
            n_samples_bg = 50
            X_bg_sub = X_bg_df.sample(n=n_samples_bg, random_state=13)
            bg_array = X_bg_sub.values

            # Explainer (classe positiva)
            pred_fn, _ = _get_pred_fn_positive_proba(clf)

            explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")
            exp = explainer(X_bg_sub)
        else:
            # KernelExplainer as fallback (slower); explain positive class probability
            print("DEBUG MODELO: Modelo a caralho")
            bg_frac = 0.25
            X_bg_sub = X_bg_df.sample(frac=bg_frac, random_state=13)
            
            print(f"Tamanho do background data para o explainer: {X_bg_sub.shape[0]} amostras.")
            bg_array = X_bg_sub.values  # KernelExplainer recebe array-like

            # Explainer (classe positiva)
            pred_fn, _ = _get_pred_fn_positive_proba(clf)
            explainer = shap.KernelExplainer(pred_fn, bg_array, link="logit")

            # Explicar o conjunto de interesse
            exp = explainer(X_bg_sub)

    except Exception as e:
        raise RuntimeError(f"[{name}] Failed to compute SHAP values for binary classifier: {e}") from e

Sep 17 '25 00:09 Felipecordeiiro

hey @Felipecordeiiro , I'm now looking into this! I tried to reproduce the error using your code but wasn't able to. Would be able to share X_bg and X_sample, or at least their shapes? I'm guessing clf is a TabPFNClassifier, but could you also share the settings you're using? And the versions of the tabpfn and shap packages?

Oct 15 '25 13:10 oscarkey

@Felipecordeiiro I'm closing this issue again for now, but feel free to reopen!

Nov 13 '25 11:11 oscarkey