Feature Request for Regression Tasks - Custom Loss Functions and Embedding Utilization
Describe the workflow you want to enable
Hi TabPFN Development Team,
First, thank you for this amazing work! I'm currently exploring TabPFN for renewable energy power prediction tasks and have some questions regarding regression functionality.
Background Context: I'm working on solar/wind power forecasting where evaluation metrics like MAPE (Mean Absolute Percentage Error) are more meaningful than standard MSE/MAE, due to the specific characteristics of energy data.
Questions:
Custom Loss Functions for Regression: Currently, when using TabPFNRegressor.fit(), is it possible to use custom loss functions like MAPE instead of the default loss? If not, would this be a feature you'd consider adding in future releases?
Alternative Approach - Embedding + Custom Head: If direct custom loss functions aren't supported, could we use the get_embedding functionality to obtain intermediate feature embeddings, then build our own MLP output head with custom loss functions? For example:
python embeddings = tabpfn_model.get_embedding(X_train)
Then build custom MLP head with MAPE loss
Use Case Justification: In energy forecasting, MAPE is particularly important because:
It provides percentage-based errors that are more interpretable for stakeholders
It handles the intermittent nature of renewable generation better
Industry standards often require MAPE reporting
Additional Considerations:
Would either approach affect TabPFN's inference speed advantages?
Are there any limitations on embedding dimensions or compatibility issues to consider?
Thank you for your time and consideration. Looking forward to your insights!
Best regards,
Describe your proposed solution
TabPFNRegressor.fit() add custom_loss paramater
Describe alternatives you've considered, if relevant
If direct custom loss functions aren't supported, could we use the get_embedding functionality to obtain intermediate feature embeddings, then build our own MLP output head with custom loss functions? For example:
python embeddings = tabpfn_model.get_embedding(X_train)
Then build custom MLP head with MAPE loss
Additional context
No response
Impact
None
Thanks for the detailed use case! With the current API you can already target custom losses (like MAPE) by choosing the point prediction that minimizes expected loss under TabPFN’s predictive distribution.
Concretely, call:
dist = reg.predict(X, output_type="full")
# dist["logits"]: torch.Tensor [n_samples, n_bins] (post-processed)
# dist["criterion"]: FullSupportBarDistribution (gives icdf/mean/median/mode)
Then approximate the expectation by evaluating your loss on a quantile grid of the predictive distribution. Using those same quantile values as the candidate predictions gives a simple, consistent rule: pick the candidate that yields the lowest expected loss.
LLM generated code sketch how this could work
import numpy as np
def predict_min_expected_mape(reg, X, *, qmin=0.01, qmax=0.99, num_q=101, eps=1e-6):
"""
Returns y_hat that minimizes expected MAPE under TabPFN's predictive distribution.
Uses an evenly spaced quantile grid for both (i) Monte Carlo expectation and
(ii) candidate predictions.
reg: fitted TabPFNRegressor
X: array-like of shape (n_samples, n_features)
"""
out = reg.predict(X, output_type="full") # uses your API
logits = out["logits"] # torch.Tensor [n_samples, n_bins]
criterion = out["criterion"] # FullSupportBarDistribution
# 1) Build a quantile grid; reuse for candidates + expectation
q_grid = np.linspace(qmin, qmax, num_q, dtype=float)
# 2) Evaluate icdf at each q => samples over Y|X and also candidate y_hats
# criterion.icdf(logits, q) -> torch.Tensor [n_samples]
samples = [criterion.icdf(logits, float(q)).cpu().detach().numpy() for q in q_grid]
samples = np.stack(samples, axis=1) # shape: [n_samples, K], K = num_q
# 3) Expected MAPE for choosing candidate j is: E[ |Y - y_j| / max(|Y|, eps) ]
# Approximate by uniform average over the quantile samples (Monte Carlo).
denom = np.maximum(np.abs(samples), eps) # [n, K]
# Broadcast: for each candidate j, compute mean_k |samples[:,k] - samples[:,j]| / denom[:,k]
# We'll vectorize by expanding dims:
diffs = np.abs(samples[:, :, None] - samples[:, None, :]) # [n, K, K]
exp_mape = (diffs / denom[:, :, None]).mean(axis=1) # [n, K] (mean over k)
# 4) Pick argmin across candidates for each sample
best_idx = exp_mape.argmin(axis=1) # [n]
y_hat = samples[np.arange(samples.shape[0]), best_idx] # [n]
return y_hat
Why this should work
TabPFN gives you a full predictive distribution (via logits + criterion). The Bayes decision for any loss $L(Y,y^)$ is the prediction $y^∗$ that minimizes the expected loss: $y^∗=argminy^E[L(Y,y^)∣X]$. We approximate that expectation by a quantile grid over the model’s own distribution—no retraining required—so you can target MAPE, asymmetric costs, pinball loss, etc.
Embeddings + custom head (alternative)
If you prefer a learned head with a custom objective that should also work but I would expect this to be significantly more effort.
We'd also love a native implementation and contribution in case you make progress.
hi,Is the intermediate layer embedding achieved by directly using the get_embedding function? I want to utilize the extracted features for downstream tasks. My task is survival analysis, which requires attention to both events and time. Currently, I can only use the classifier and extract the intermediate layer embedding, then proceed with the downstream tasks.