botorch [FEATURE REQUEST]: Task-specific noise in MultiTaskGPs

Motivation

For problems where each observation is from one of a selection of tasks, existing MultiTaskGPs assume the same noise across all tasks. This is not necessarily a good assumption, especially in a multi-fidelity setting where low fidelities are likely much noisier than higher fidelities.

Describe the solution you'd like to see implemented in BoTorch.

I'd like a MultiTaskHadamardLikelihood, that learns a different noise value for each task. This could also be made the default likelihood for all MultiTaskGPs.

Describe any alternatives you've considered to the above solution.

None

Is this related to an existing issue in BoTorch or another repository? If so please include links to those Issues here.

I have a pending PR in the gpytorch library that didn't receive much attention: https://github.com/cornellius-gp/gpytorch/pull/2481, relating to issue https://github.com/cornellius-gp/gpytorch/issues/877. If that PR is left unmerged, that code could be duplicated in BoTorch; if is merged, then BoTorch models should be changed to use the new likelihood.

There are also a couple of TODOs in BoTorch that relate to this: https://github.com/pytorch/botorch/blob/094e3efa2702da185e2847a5e0b3768222ecf59d/botorch/models/gpytorch.py#L832

https://github.com/pytorch/botorch/blob/094e3efa2702da185e2847a5e0b3768222ecf59d/botorch/models/multitask.py#L203

Pull Request

Yes

Code of Conduct

[x] I agree to follow BoTorch's Code of Conduct

Mar 11 '25 16:03 TobyBoyne

@TobyBoyne thanks for raising this - We'd be interested in incorporating this; I believe that it would be best to land the change in gpytorch and then use that (optionally) in botorch. I can help review the gpytorch changes - one thing I'd like to understand better is how much benefit this adds and which settings benefit the most from this. Do you have some examples / benchmarks on this that you could share?

Mar 11 '25 16:03 Balandat

Thanks for the reply, @Balandat!

The GPyTorch PR includes a toy multi-task benchmark, with different levels of noise. We can see that the likelihood will tend to learn the highest noise, leading to less tight predictive distributions, even for low-noise tasks.

In terms of more real-world benchmarks, I've found this is a problem in multi-fidelity settings. For example, using early termination of neural network training as a low fidelity for HPO tends to be much noisier than allowing the training to converge. Similarly, for physics-simulation-based black boxes, simulating a smaller problem is both meaningfully different (so an LCM is needed) and also noisier due to a higher impact of random initializations (so independent task noises are needed).

It's not immediately obvious to me how much of downstream impact this has on BO. Poorly calibrated surrogates will likely hurt BO performance. This noise sharing leads to an overconfident GP posterior for noisy tasks, and an underconfident posterior for the less noisy tasks.

Mar 11 '25 17:03 TobyBoyne

It looks like this was closed by #2997. Thanks, @sdaulton!

Sep 11 '25 14:09 TobyBoyne