TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

Feature Request for Regression Tasks - Custom Loss Functions and Embedding Utilization

Open hexuwei-epri opened this issue 2 months ago • 1 comments

Describe the workflow you want to enable

Hi TabPFN Development Team,

First, thank you for this amazing work! I'm currently exploring TabPFN for renewable energy power prediction tasks and have some questions regarding regression functionality.

Background Context: I'm working on solar/wind power forecasting where evaluation metrics like MAPE (Mean Absolute Percentage Error) are more meaningful than standard MSE/MAE, due to the specific characteristics of energy data.

Questions:

Custom Loss Functions for Regression: Currently, when using TabPFNRegressor.fit(), is it possible to use custom loss functions like MAPE instead of the default loss? If not, would this be a feature you'd consider adding in future releases?

Alternative Approach - Embedding + Custom Head: If direct custom loss functions aren't supported, could we use the get_embedding functionality to obtain intermediate feature embeddings, then build our own MLP output head with custom loss functions? For example:

python embeddings = tabpfn_model.get_embedding(X_train)

Then build custom MLP head with MAPE loss

Use Case Justification: In energy forecasting, MAPE is particularly important because:

It provides percentage-based errors that are more interpretable for stakeholders

It handles the intermittent nature of renewable generation better

Industry standards often require MAPE reporting

Additional Considerations:

Would either approach affect TabPFN's inference speed advantages?

Are there any limitations on embedding dimensions or compatibility issues to consider?

Thank you for your time and consideration. Looking forward to your insights!

Best regards,

Describe your proposed solution

TabPFNRegressor.fit() add custom_loss paramater

Describe alternatives you've considered, if relevant

If direct custom loss functions aren't supported, could we use the get_embedding functionality to obtain intermediate feature embeddings, then build our own MLP output head with custom loss functions? For example:

python embeddings = tabpfn_model.get_embedding(X_train)

Then build custom MLP head with MAPE loss

Additional context

No response

Impact

None

hexuwei-epri avatar Oct 22 '25 07:10 hexuwei-epri

Thanks for the detailed use case! With the current API you can already target custom losses (like MAPE) by choosing the point prediction that minimizes expected loss under TabPFN’s predictive distribution.

Concretely, call:

    dist = reg.predict(X, output_type="full") 
    # dist["logits"]: torch.Tensor  [n_samples, n_bins] (post-processed) 
    # dist["criterion"]: FullSupportBarDistribution (gives icdf/mean/median/mode)

Then approximate the expectation by evaluating your loss on a quantile grid of the predictive distribution. Using those same quantile values as the candidate predictions gives a simple, consistent rule: pick the candidate that yields the lowest expected loss.

LLM generated code sketch how this could work

    import numpy as np
    
    def predict_min_expected_mape(reg, X, *, qmin=0.01, qmax=0.99, num_q=101, eps=1e-6):
        """
        Returns y_hat that minimizes expected MAPE under TabPFN's predictive distribution.
        Uses an evenly spaced quantile grid for both (i) Monte Carlo expectation and
        (ii) candidate predictions.
        
        reg: fitted TabPFNRegressor
        X:   array-like of shape (n_samples, n_features)
        """
        out = reg.predict(X, output_type="full")  # uses your API
        logits = out["logits"]                    # torch.Tensor [n_samples, n_bins]
        criterion = out["criterion"]              # FullSupportBarDistribution
        
        # 1) Build a quantile grid; reuse for candidates + expectation
        q_grid = np.linspace(qmin, qmax, num_q, dtype=float)
        
        # 2) Evaluate icdf at each q => samples over Y|X and also candidate y_hats
        #    criterion.icdf(logits, q) -> torch.Tensor [n_samples]
        samples = [criterion.icdf(logits, float(q)).cpu().detach().numpy() for q in q_grid]
        samples = np.stack(samples, axis=1)   # shape: [n_samples, K], K = num_q
        
        # 3) Expected MAPE for choosing candidate j is: E[ |Y - y_j| / max(|Y|, eps) ]
        #    Approximate by uniform average over the quantile samples (Monte Carlo).
        denom = np.maximum(np.abs(samples), eps)                # [n, K]
        
        # Broadcast: for each candidate j, compute mean_k  |samples[:,k] - samples[:,j]| / denom[:,k]
        # We'll vectorize by expanding dims:
        diffs = np.abs(samples[:, :, None] - samples[:, None, :])   # [n, K, K]
        exp_mape = (diffs / denom[:, :, None]).mean(axis=1)         # [n, K]  (mean over k)
        
        # 4) Pick argmin across candidates for each sample
        best_idx = exp_mape.argmin(axis=1)                          # [n]
        y_hat = samples[np.arange(samples.shape[0]), best_idx]      # [n]
        return y_hat

Why this should work

TabPFN gives you a full predictive distribution (via logits + criterion). The Bayes decision for any loss $L(Y,y^​)$ is the prediction $y^​∗$ that minimizes the expected loss: $y^​∗=argminy^​​E[L(Y,y^​)∣X]$. We approximate that expectation by a quantile grid over the model’s own distribution—no retraining required—so you can target MAPE, asymmetric costs, pinball loss, etc.

Embeddings + custom head (alternative)

If you prefer a learned head with a custom objective that should also work but I would expect this to be significantly more effort.

We'd also love a native implementation and contribution in case you make progress.

noahho avatar Oct 22 '25 11:10 noahho

hi,Is the intermediate layer embedding achieved by directly using the get_embedding function? I want to utilize the extracted features for downstream tasks. My task is survival analysis, which requires attention to both events and time. Currently, I can only use the classifier and extract the intermediate layer embedding, then proceed with the downstream tasks.

20030125lc-cyber avatar Nov 15 '25 12:11 20030125lc-cyber