botorch [FEATURE REQUEST]: StratifiedStandardize for multi-output models

Motivation

Would it be possible to make StratifiedStandardize more general so that it can work with multi-output models as Standardize? This would be needed for MultiTaskGP which requires stratified standardisation and can work with multiple outputs.

Describe the solution you'd like to see implemented in BoTorch.

Same as Standardise which has multi-output parameters m and outputs.

Describe any alternatives you've considered to the above solution.

No response

Is this related to an existing issue in BoTorch or another repository? If so please include links to those Issues here.

No response

Pull Request

None

Code of Conduct

[x] I agree to follow BoTorch's Code of Conduct

Feb 10 '25 13:02 Hrovatin

Hi @Hrovatin,

This would be needed for MultiTaskGP which requires stratified standardisation and can work with multiple outputs.

StratifiedStandardize works with MultiTaskGP if you include the task values in X when calling MultiTaskGP.posterior, since that makes the posterior a single-output posterior. I agree that it would be good to support the case where there are multiple outputs too. Would you be willing to put up a PR? Here is an example of the current single output support with MTGP:

import math

import matplotlib.pyplot as plt
import torch
from botorch.fit import fit_gpytorch_mll
from botorch.models.multitask import MultiTaskGP
from botorch.models.transforms.outcome import StratifiedStandardize
from gpytorch.mlls.exact_marginal_log_likelihood import ExactMarginalLogLikelihood

tkwargs = {"dtype": torch.double}
torch.manual_seed(0)
X_task = torch.zeros(40, 1, **tkwargs)
X_task[(X_task.shape[0] // 2) :] = 1
X = torch.cat(
    [torch.rand_like(X_task), X_task],
    dim=-1,
)
Y = torch.sin(2 * math.pi * X[..., :1]) + 5 * X_task + X[..., :1]


task0 = (X_task == 0).view(-1)
plt.plot(X[task0, 0], Y[task0], ".", ms=10)
task1 = (X_task == 1).view(-1)
plt.plot(X[task1, 0], Y[task1], ".", ms=10)
model = MultiTaskGP(
    X,
    Y,
    task_feature=1,
    outcome_transform=StratifiedStandardize(
        task_values=X_task.unique().long(), stratification_idx=1
    ),
)

mll = ExactMarginalLogLikelihood(model.likelihood, model)
_ = fit_gpytorch_mll(mll)

test_X = torch.linspace(0, 1, 101, **tkwargs).unsqueeze(1)
test_X = torch.cat(
    [
        test_X,
        torch.zeros(101, 1, **tkwargs),
    ],
    dim=-1,
)
test_X2 = torch.linspace(0, 1, 101, **tkwargs).unsqueeze(1)
test_X2 = torch.cat(
    [
        test_X2,
        torch.ones(101, 1, **tkwargs),
    ],
    dim=-1,
)
test_X_all = torch.cat([test_X, test_X2], dim=0)

with torch.no_grad():
    posterior = model.posterior(test_X_all)

plt.plot(test_X_all[:101, 0], posterior.mean[:101], label="posterior mean")
plt.plot(test_X_all[101:, 0], posterior.mean[101:], label="posterior mean")

Feb 10 '25 19:02 sdaulton

Thank you very much for the response. I think wasn't clear - I meant that using StratifiedStandardize would prevent using the multi-output option of the MultiTaskGP, if I am not mistaken.

We are currently testing the MultiTaskGP and the StratifiedScale - depending on how that affects performance we will decide to use it or not. If we decide to use it we would need a multi-output function so in that case I would probably implement it.

Another question: What is your opinion on using StratifiedScale in transfer learning setting when one domain may have much less measured data points that may be also biased towards a specific parameter space/output region. - Did you ever observe that doing startified output scaling could in fact be problematic when tasks do not cover the same regions of the output space?

Feb 11 '25 14:02 Hrovatin

Did you ever observe that doing startified output scaling could in fact be problematic when tasks do not cover the same regions of the output space?

I haven't observed that. The task-task covariance should be able to capture tasks being on different scales. It seems like tasks not covering the same region would have the same effect with/without standardizing each task independently

Feb 11 '25 18:02 sdaulton

@Hrovatin

Did you ever observe that doing startified output scaling could in fact be problematic when tasks do not cover the same regions of the output space?

We did actually observe this a little while back, which lead us to change the default MultiTask parametrization. Stratified standardization can certainly be problematic when there isn't a lot of data on each task. The new parametrization, which includes a MultiTaskMean, should be able to better correlate data when the bias differs on a per-task basis.

However, I would contend that the bias has less to do with where in the search space the data is gathered, and more about how good the data is on each task. For example, if all the points gathered on task A are relatively good and all those gathered on task B are relatively bad, StratifiedStandardize not be your friend. Nonetheless, the two biases (input space/output space) would be very related.

Nov 04 '25 13:11 hvarfner