botorch [Bug]: MultiTaskGP Prediction when all

What happened?

I have a specific application in which MultiTaskGP receives some tasks not present in the training data. As far as I understand, the solution is to set the all_tasks argument of the MultiTaskGP.

It worked fine on botorch==0.15.0, but it's not working on botorch==0.16.0. Even if I set all_tasks it raises ValueError: Received invalid raw task values. I also verified that the num_tasks property is set to the number of inferred tasks from the training data, not the one passed in all_tasks.

I noticed that there were some recent changes (#3006) in which the logic regarding missing tasks was changed, resulting in the substitution of all_tasks by all_tasks_inferred in many parts of the code. I was not able to find much information about the changes, so I apologize if I'm getting this error because I'm misunderstanding how to use the class.

I'm happy to help if there's anything that needs to be changed.

Thanks!

Please provide a minimal, reproducible example of the unexpected behavior.

import torch
from botorch.models.multitask import MultiTaskGP

# Creating training data with 3 tasks (0, 1, 2)
train_data = torch.randn(10, 2, dtype=torch.float64)
train_tasks = torch.randint(0, 3, (10, 1))
train_X = torch.cat([train_data, train_tasks], dim=1)
train_Y = torch.randn(10, 1, dtype=torch.float64)

# Creating testing data with 5 tasks (0, 1, 2, 3, 4)
test_data = torch.randn(5, 2, dtype=torch.float64)
test_tasks = torch.arange(5).reshape(-1, 1)
test_X = torch.cat([test_data, test_tasks], dim=1)

# Creating multi-task GP with 5 tasks
gp = MultiTaskGP(
    train_X=train_X, train_Y=train_Y, task_feature=2, all_tasks=list(range(5))
)

print(gp.num_tasks) # outputs 3 (number of inferred tasks)

gp.posterior(test_X) # raises error

Please paste any relevant traceback/logs produced by the example provided.

Traceback (most recent call last):
  File "/Users/gsutterp/tmp/multitaskgp.py", line 21, in <module>
    gp.posterior(test_X)
    ~~~~~~~~~~~~^^^^^^^^
  File "/Users/gsutterp/base_env/lib/python3.13/site-packages/botorch/models/gpytorch.py", line 1059, in posterior
    task_features = self._map_tasks(task_values=task_features)
  File "/Users/gsutterp/base_env/lib/python3.13/site-packages/botorch/models/multitask.py", line 310, in _map_tasks
    raise ValueError(
    ...<3 lines>...
    )
ValueError: Received invalid raw task values. Expected raw value to be in {0, 1, 2}, but got unexpected task values: {3, 4}.

BoTorch Version

0.16.0

Python Version

No response

Operating System

No response

(Optional) Describe any potential fixes you've considered to the issue outlined above.

No response

Pull Request

None

Code of Conduct

[x] I agree to follow BoTorch's Code of Conduct

Nov 20 '25 22:11 suttergustavo

Hi @suttergustavo. The error goes away if you specify validate_task_values=False when constructing the model. However, the behavior might be slightly different from what you had prior to 0.16.0 (https://github.com/meta-pytorch/botorch/pull/2960 specifically). I believe previously it used the task / index kernel with unobserved (thus untrained) tasks and just looked up whatever entry it had. Now it will default to the first output task, which is 0 in this case.

Nov 21 '25 00:11 saitcakmak

Hi @suttergustavo and @saitcakmak 👋🏼 I was just going to open a similar issue, but then stumbled over this one here.

Before commenting, here my minimal example. While I think it's related, it's not quite identical – I get a different error message and can't even instantiate the model. Hence, while cause by the same PR (#2960), I believe the situation is actually a slightly different one. In fact, the validate_task_values argument has no effect on it:

import torch
from botorch.models import MultiTaskGP

train_X = torch.tensor([[0, 1.0], [0, 2.0]])
train_Y = torch.tensor([[1.0], [2.0]])
model = MultiTaskGP(train_X, train_Y, task_feature=0, all_tasks=[0, 1], rank=2)

RuntimeError: Cannot create a task covariance matrix larger than the number of tasks

@saitcakmak: can you comment on whether you think it's a different problem? --> If so, I can also open a separate issue if preferred.

Now my thoughts on the original one

I believe previously it used the task / index kernel with unobserved (thus untrained) tasks and just looked up whatever entry it had. Now it will default to the first output task, which is 0 in this case.

I think this change is quite confusing in several ways, perhaps you can share your thoughts:

While using the untrained tasks is perhaps not great in terms of predictive accuracy, it's in my eyes clearly the "default thing" that the user would expect to happen. Also, still makes sense to actually request these predictions, e.g. for benchmarking purposes where you compare the performance of the trained against the untrained ones. For my understanding, a "default" should always attempt to do the least surprising or least opinionated. However, defaulting to the first output task in this case appears completely arbitrary to me. Wouldn't it thus make much more sense to flip the default value for that flag?
The name validate_task_values is quite confusing IMO. With validation I'd expect to get an error in case of unexpected/unsupported input, but not to silently change the internal logic and switch to an opinionated fallback. I'm afraid this has the danger that people really run into silent bugs.
Finally, the raises error message is quite misleading, I think, because getting a message like but got unexpected task values: {3, 4} is very surprising when the user has explicitly provided these tasks, i.e. they are by no means "unexpected" 😬

Happy to hear your opinions 🙃

Nov 21 '25 12:11 AdrianSosic

cc @sdaulton

Nov 21 '25 16:11 saitcakmak

Thank you, @saitcakmak !

Nov 21 '25 21:11 suttergustavo

Yeah these are related, but slight different issues. Currently, MultiTaskGP will create an index kernel with the number of tasks equal to the number of tasks in the training data, not the tasks in all_tasks. If you pass validate_task_values=False, tasks that are not in the training data, but in all_tasks will be mapped to the target task value (the output task). In your case @AdrianSosic, the issue is that currently we don't support instantiating index kernels with more tasks than we have data for.

You could implement support for that which would require

a new argument indicating how tasks in all_tasks but without training data should be handled (e.g. whether they should be remapped to the target output task (as is currently done) or whether they should be treated as separate tasks).
making num_tasks = all_tasks (https://github.com/meta-pytorch/botorch/blob/main/botorch/models/multitask.py#L240C9-L240C49) if it is desired to model all tasks separately, based on the input argument
add those tasks without training data into the observed_task_values when instantiating the task_mapper (https://github.com/meta-pytorch/botorch/blob/main/botorch/models/multitask.py#L313-L315)

Nov 24 '25 12:11 sdaulton

While using the untrained tasks is perhaps not great in terms of predictive accuracy, it's in my eyes clearly the "default thing" that the user would expect to happen. Also, still makes sense to actually request these predictions, e.g. for benchmarking purposes where you compare the performance of the trained against the untrained ones.

@AdrianSosic how should these tasks be handled? At the end of the day the inter-task correlation matrix you use for prediction needs some kind of entries to express the cross-task correlations. If that is just randomly initialized (as is currently the case) then this seems somewhat meaningless. IMO you'd have to have some kind of prior about the correlation and initialize the respective elements of that inter-task correlation matrix accordingly for this to make sense.

Nov 24 '25 15:11 Balandat

the issue is that currently we don't support instantiating index kernels with more tasks than we have data for.

Hi @sdaulton, thanks for answer. I'm not sure if I already understand the underlying reason, but the critical point is: Have you noticed that error is strictly speaking a backwards-incompatible change, since the code worked without problems in earlier versions? Can you perhaps explain what exactly has changed in the logic that made it possible previously but breaks now?

Nov 24 '25 16:11 AdrianSosic

how should these tasks be handled?

Well, I'm just sharing my thoughts here, so I won't get mad if you disagree 🙃

I'm looking at this mostly from a user perspective. Of course, the randomly initialized values carry no information, but I'd argue that interface-wise it still corresponds to the "natural" choice of edge-case handling that comes with no surprises.

What I mean: Let's consider the case where you additionally passed task_covar_prior. Then, the predictions you get would be "mixtures" of your source tasks according to your prior over inter-task covariances, which would indeed be a meaningful outcome, right? Now if I loosen the prior more and more until it's completely flat, I eventually end up with a "maximum likelihood estimate with zero data". While the latter is of course not meaningful in terms of predictions (except for benchmarking), the outputs along the way still are, and it feels quite weird to me to suddenly introduce a change of the underlying logic for the limiting case. Especially jumping to one particular task is surprising since the model is otherwise fully symmetric w.r.t. tasks.

I could cook up the same argument with a scenario where instead of "reducing" the prior, we instead start with a large training data set which we gradually reduce until suddenly some tasks are no longer represented. Also here, a user would not expect a sudden change of logic at the point where tasks disappear from the training set, at least not as the default behavior 🙃

Nov 24 '25 17:11 AdrianSosic

Perhaps one more thought regardless of anything above. I think what is suboptimal in terms of responsibilities:

The following line from my example simply used to work earlier on:

model = MultiTaskGP(train_X, train_Y, task_feature=0, all_tasks=[0, 1], rank=2)

That is, as long as the tasks in train_X were a subset of all_tasks, the user wouldn't have to bother about the concrete content. That is, they specify "all possible tasks" via all_tasks (i.e. the set of all possibilities, just like the name indicates) and provide a dataset compatible with that ✅

Now, the success of this call depends on the specific content of train_X, which means the responsibility of validating train_X and potentially dispatching to a different logic suddenly becomes the responsibility of the user ❌

Nov 25 '25 20:11 AdrianSosic

[Bug]: MultiTaskGP Prediction when all_tasks is passed

What happened?

Please provide a minimal, reproducible example of the unexpected behavior.

Please paste any relevant traceback/logs produced by the example provided.

BoTorch Version

Python Version

Operating System

(Optional) Describe any potential fixes you've considered to the issue outlined above.

Pull Request

Code of Conduct

Now my thoughts on the original one