Ax icon indicating copy to clipboard operation
Ax copied to clipboard

[GENERAL SUPPORT]: Model Fitting

Open leolin8806 opened this issue 1 month ago • 5 comments

Question

Hi,

I'm using Ax + BoTorch for a physical experiment involving a custom drone propeller. For every trial, I choose a set of input parameters being:

camber, root chord, tip chord, corner radius, angle of attack, and symmetric (as a yes/no value)

After running each trial, I measure four outputs: lift, drag, rpm, and vibration

My goal is to perform Bayesian optimization where the inputs are the propeller geometry parameters, and the outputs are the physical measurements above (e.g., maximizing lift/drag while also modeling rpm and vibration).

Because I have multiple outputs per experiment, I attempted to use MultiTaskGP from BoTorch. However, Ax skips the MultiTaskGP model and eventually throws the error:

ModelFittingError: Cannot fit MultiTaskGP without task feature.

I now understand that MultiTaskGP requires a task feature dimension, but in my setup: each trial has multiple metrics, not multiple “tasks.” I do not have different fidelities, environments, or conditions, but all metrics are recorded from the same physical experiment

So my questions are:

  1. Is MultiTaskGP appropriate for multi-output data like this?

My understanding is now that tasks ≠ metrics. But I want to confirm whether MultiTaskGP is intended for: multi-output regression or only multi-fidelity / multi-environment modeling

  1. If MultiTaskGP is the correct model, what should the task_feature be?

Since I have outputs like lift, drag, rpm, and vibration, would each metric need to become a separate “task”? And would that require reshaping the data into long-form with a task column and only one metric per row?

  1. If MultiTaskGP is not the intended solution, what model should I be using for multi-output BO?

Is the recommended approach: independent SingleTaskGPs (one per metric), ModelListGP in BoTorch, or something else supported by Ax?

  1. Is there a built-in way in Ax’s modular BoTorch backend to jointly model multiple outputs?

I want to follow Ax/BoTorch best practices for multi-output optimization. Any guidance on which model class is correct for this scenario—and how the task feature should be defined, if applicable—would be greatly appreciated.

Thanks.

Please provide any relevant code snippet if applicable.


Code of Conduct

  • [x] I agree to follow this Ax's Code of Conduct

leolin8806 avatar Nov 18 '25 01:11 leolin8806

Is MultiTaskGP appropriate for multi-output data like this? My understanding is now that tasks ≠ metrics. But I want to confirm whether MultiTaskGP is intended for: multi-output regression or only multi-fidelity / multi-environment modeling

Typically, MultiTaskGP is used for multi-fidelity / multi-source modeling. If there are correlations between the outcomes you could also designate the different outcomes as different tasks to try and infer correlations between them. However, as for each input configuration you observe all metrics, that could only help if there is strong, correlated observation noise on the outcomes.

If MultiTaskGP is not the intended solution, what model should I be using for multi-output BO?

You should just use the Ax default, which is using independent SingleTaskGPs for each outcome. As mentioned above, the benefit of modeling correlations across different metrics is generally small compared to the increased complexity and runtime.

Is there a built-in way in Ax’s modular BoTorch backend to jointly model multiple outputs?

I don't think we have that currently exposed in Ax - it may be possible with the right configuration but I'm not sure about that and it's certainly not fully supported or tested. cc @saitcakmak who has a deeper understanding of the Ax/BoTorch modeling interface.

Balandat avatar Nov 19 '25 14:11 Balandat

Is there a built-in way in Ax’s modular BoTorch backend to jointly model multiple outputs?

Technically, yes -- though I haven't tried this in quite a while. If you have box-design observations (all outcomes / metrics to be modeled are all observed for all trials) and you configure SurrogateSpec with allow_batched_models=True, it will combine them into a single SupervisedDataset and fit a single model with them. By default, this is just a multi-output SingleTaskGP (which is just independent batched model under the hood), but you could provide a model class that models correlations across outcomes instead. This could be a simple modification of MultiTaskGP that automatically constructs the task index etc.

In addition to what @Balandat said above, there are other caveats. Even if you provide allow_batched_models=True, Ax may decide to model them independently (e.g., if the model class is MultiTaskGP). There is a should_use_model_list helper that determines this. It'd need to return False for datasets to be combined & modeled together. We had a bunch of issues with joint modeling in the past, so our setup heavily favors independent models.

saitcakmak avatar Nov 19 '25 15:11 saitcakmak

Hi Max and Sait,

Thanks so much for the help so far with Leo’s question! He is a student working in a project I’m also involved in and maybe can add some info.

I wrote a first version of their code which used the standard SingleTaskGP to directly model the lift/drag quantity we want to optimize. However, we were specifically told by experimentalists leading this project that the information from these other metrics (float rpm and binary vibration) were highly correlated to our objective, and asked us to model them jointly. Overall, we are only trying to optimize one quantity (lift/drag), but want to use the information form all five (lift, drag, rpm, vibration, and lift/drag), which are highly correlated, both in mean and in noise. We'd follow a similar approach from the paper https://proceedings.neurips.cc/paper_files/paper/2007/file/66368270ffd51418ec58bd793f2d9b1b-Paper.pdf, which is also linked under the MultiTaskGP documentation.

The experiments can take days to happen, so in comparison we don’t care much about the computational cost of generating samples, as long as these correlations can be modeled. My understanding is that using BoTorch’s MultiTaskGP, and setting each metric as a separate task would accomplish this goal. I’d then hook that into Ax using by specifying the ModelConfig inside SurrogateSpec (similar to the tutorial on https://ax.dev/docs/tutorials/modular_botorch/). Is this approach correct? If so, could you provide some guidance on how to specify this behavior when defining the ModelConfig and SurrogateSpec?

I was also considering coding something custom myself for this, but didn’t want to reinvent the wheel given all the nice things you all have already implemented in Ax/BoTorch :)

-Leonardo

leonardoguilhoto avatar Nov 20 '25 20:11 leonardoguilhoto

Hi @leolin8806 & @leonardoguilhoto. I spent some time looking into this. There isn't a nice off-the-shelf way of doing this, but with some customization, I got something working. Note that this does require block-design, i.e., all metrics must be observed for all outcomes.

Here is the code, modified from the modular BoTorch tutorial.

Boilerplate, mostly copied over.

from typing import Any
from unittest import mock

import torch
from ax.adapter.registry import Generators
from ax.api.client import Client
from ax.api.configs import RangeParameterConfig
from ax.core.search_space import SearchSpaceDigest
from ax.generation_strategy.center_generation_node import CenterGenerationNode
from ax.generation_strategy.generation_node import GenerationNode
from ax.generation_strategy.generation_strategy import GenerationStrategy
from ax.generation_strategy.generator_spec import GeneratorSpec
from ax.generation_strategy.transition_criterion import MinTrials
from ax.generators.torch.botorch_modular.surrogate import (
    _construct_submodules,
    ModelConfig,
    submodel_input_constructor,
    Surrogate,
    SurrogateSpec,
)
from botorch.exceptions.errors import UnsupportedError
from botorch.models.multitask import MultiTaskGP
from botorch.utils.datasets import SupervisedDataset


def construct_generation_strategy(
    generator_spec: GeneratorSpec,
    node_name: str,
) -> GenerationStrategy:
    """Constructs a Center + Sobol + Modular BoTorch `GenerationStrategy`
    using the provided `generator_spec` for the Modular BoTorch node.
    """
    botorch_node = GenerationNode(
        name=node_name,
        generator_specs=[generator_spec],
    )
    sobol_node = GenerationNode(
        name="Sobol",
        generator_specs=[
            GeneratorSpec(
                generator_enum=Generators.SOBOL,
                # Let's use model_kwargs to set the random seed.
                model_kwargs={"seed": 0},
            ),
        ],
        transition_criteria=[
            # Transition to BoTorch node once there are 5 trials on the experiment.
            MinTrials(
                threshold=5,
                transition_to=botorch_node.name,
                use_all_trials_in_exp=True,
            )
        ],
    )
    # Center node is a customized node that uses a simplified logic and has a
    # built-in transition criteria that transitions after generating once.
    center_node = CenterGenerationNode(next_node_name=sobol_node.name)
    return GenerationStrategy(
        name=f"Center+Sobol+{node_name}", nodes=[center_node, sobol_node, botorch_node]
    )

Custom model and input constructor

class CustomMTGP(MultiTaskGP):
    @classmethod
    def construct_inputs(
        cls,
        training_data: SupervisedDataset,
    ) -> dict[str, Any]:
        num_tasks = len(training_data.outcome_names)
        if num_tasks < 2:
            raise UnsupportedError("Expected multi-output dataset.")
        if training_data.Yvar is not None:
            raise NotImplementedError(
                "If noise support is needed, repeat below with Yvar."
            )
        # Repeat Xs and append task index, counting up from 0.
        X = training_data.X
        all_Xs = torch.cat(
            [torch.nn.functional.pad(X, (0, 1), value=i) for i in range(num_tasks)],
            dim=0,
        )
        # Break apart Ys into individual outcomes.
        Y = training_data.Y
        all_Ys = torch.cat([Y[:, i : i + 1] for i in range(num_tasks)], dim=0)
        return {
            "train_X": all_Xs,
            "train_Y": all_Ys,
            "task_feature": -1,
            # Make sure model produces outputs for all tasks.
            "output_tasks": list(range(num_tasks)),
        }


# MTGP input constructor does not allow multi-output MTGPs. Need to register a custom one.
@submodel_input_constructor.register(CustomMTGP)
def _submodel_input_constructor_momtgp(
    botorch_model_class: type[CustomMTGP],
    model_config: ModelConfig,
    dataset: SupervisedDataset,
    search_space_digest: SearchSpaceDigest,
    surrogate: Surrogate,
) -> dict[str, Any]:
    formatted_model_inputs: dict[str, Any] = botorch_model_class.construct_inputs(
        training_data=dataset,
        **model_config.model_options,
    )
    submodules = _construct_submodules(
        model_config=model_config,
        dataset=dataset,
        # This is used when constructing the input transforms.
        search_space_digest=search_space_digest,
        # Used to check for supported arguments and in covar module input constructors.
        botorch_model_class=botorch_model_class,
    )
    formatted_model_inputs.update(submodules)
    return formatted_model_inputs

Construct the GS and setup the experiment. Using two dummy metrics.

generator_spec = GeneratorSpec(
    generator_enum=Generators.BOTORCH_MODULAR,
    model_kwargs={"surrogate_spec": surrogate_spec},
)
generation_strategy = construct_generation_strategy(
    generator_spec=generator_spec,
    node_name="MTGP with MetricsAsTask",
)


client = Client()

# Define two float parameters x1, x2 in unit hypercube.
range_parameters = [
    RangeParameterConfig(name="x1", parameter_type="float", bounds=(0, 1)),
    RangeParameterConfig(name="x2", parameter_type="float", bounds=(0, 1)),
]
client.configure_experiment(parameters=range_parameters)
objective = "-metric1, metric2"  # Minimize metric1 and maximize metric2.
client.configure_optimization(objective=objective)

def test_function(x1, x2):
    # A made-up function.
    return x1**2.0 - (x2 + 5.0) ** 0.75 / 4.0, (x1 + x2) ** 0.5

client.set_generation_strategy(
    generation_strategy=generation_strategy,
)

Running the experiment. Note the mock forcing use of a single model.


with mock.patch(
    "ax.generators.torch.botorch_modular.surrogate.use_model_list", return_value=False
):
    for _ in range(10):
        trials = client.get_next_trials(max_trials=1)
        for index, parameters in trials.items():
            result = test_function(**parameters)
            client.complete_trial(
                trial_index=index, raw_data={"metric1": result[0], "metric2": result[1]}
            )

Runs as usual

[INFO 11-25 09:47:14] ax.api.client: Generated new trial 0 with parameters {'x1': 0.5, 'x2': 0.5} using GenerationNode CenterOfSearchSpace.
[INFO 11-25 09:47:14] ax.api.client: Trial 0 marked COMPLETED.
[INFO 11-25 09:47:14] ax.api.client: Generated new trial 1 with parameters {'x1': 0.475107, 'x2': 0.592524} using GenerationNode Sobol.
[INFO 11-25 09:47:14] ax.api.client: Trial 1 marked COMPLETED.
[INFO 11-25 09:47:14] ax.api.client: Generated new trial 2 with parameters {'x1': 0.578763, 'x2': 0.037122} using GenerationNode Sobol.
[INFO 11-25 09:47:14] ax.api.client: Trial 2 marked COMPLETED.
[INFO 11-25 09:47:14] ax.api.client: Generated new trial 3 with parameters {'x1': 0.95067, 'x2': 0.862344} using GenerationNode Sobol.
[INFO 11-25 09:47:14] ax.api.client: Trial 3 marked COMPLETED.
[INFO 11-25 09:47:14] ax.api.client: Generated new trial 4 with parameters {'x1': 0.120458, 'x2': 0.261442} using GenerationNode Sobol.
[INFO 11-25 09:47:14] ax.api.client: Trial 4 marked COMPLETED.
[W 251125 09:47:14 winsorize:123] Encountered a `MultiObjective` without objective thresholds. We will winsorize each objective separately. We strongly recommend specifying the objective thresholds when using multi-objective optimization.
[INFO 11-25 09:47:15] ax.api.client: Generated new trial 5 with parameters {'x1': 0.595547, 'x2': 0.871447} using GenerationNode MTGP with MetricsAsTask.
[INFO 11-25 09:47:15] ax.api.client: Trial 5 marked COMPLETED.
[W 251125 09:47:15 winsorize:123] Encountered a `MultiObjective` without objective thresholds. We will winsorize each objective separately. We strongly recommend specifying the objective thresholds when using multi-objective optimization.
[INFO 11-25 09:47:16] ax.api.client: Generated new trial 6 with parameters {'x1': 0.256358, 'x2': 1.0} using GenerationNode MTGP with MetricsAsTask.
[INFO 11-25 09:47:16] ax.api.client: Trial 6 marked COMPLETED.
[W 251125 09:47:16 winsorize:123] Encountered a `MultiObjective` without objective thresholds. We will winsorize each objective separately. We strongly recommend specifying the objective thresholds when using multi-objective optimization.
[INFO 11-25 09:47:17] ax.api.client: Generated new trial 7 with parameters {'x1': 0.724939, 'x2': 1.0} using GenerationNode MTGP with MetricsAsTask.
[INFO 11-25 09:47:17] ax.api.client: Trial 7 marked COMPLETED.
[W 251125 09:47:17 winsorize:123] Encountered a `MultiObjective` without objective thresholds. We will winsorize each objective separately. We strongly recommend specifying the objective thresholds when using multi-objective optimization.
[INFO 11-25 09:47:18] ax.api.client: Generated new trial 8 with parameters {'x1': 0.47117, 'x2': 1.0} using GenerationNode MTGP with MetricsAsTask.
[INFO 11-25 09:47:18] ax.api.client: Trial 8 marked COMPLETED.
[W 251125 09:47:18 winsorize:123] Encountered a `MultiObjective` without objective thresholds. We will winsorize each objective separately. We strongly recommend specifying the objective thresholds when using multi-objective optimization.
[INFO 11-25 09:47:19] ax.api.client: Generated new trial 9 with parameters {'x1': 0.597719, 'x2': 1.0} using GenerationNode MTGP with MetricsAsTask.
[INFO 11-25 09:47:19] ax.api.client: Trial 9 marked COMPLETED.

Verify the multi-output MTGP model used:

mtgp = client._generation_strategy.adapter.generator.surrogate.model
mtgp.posterior(torch.tensor([[0.5, 0.5]], dtype=torch.double)).mean
tensor([[-0.1631, -0.3612]], dtype=torch.float64, grad_fn=<TransposeBackward0>)

Hope this is useful!

saitcakmak avatar Nov 25 '25 17:11 saitcakmak

Hi Sait,

Thank you for the help with this. It's very appreciated!

I tried running the code you suggested, only adding

surrogate_spec = SurrogateSpec(
    model_configs=[
        ModelConfig(
            botorch_model_class=CustomMTGP,
        )
    ],
    allow_batched_models=True,
)

and can verify that it seems to work as intended.

I'll try to integrate this into our problem, and hopefully things will work out.

Appreciate the help once again!

-Leonardo

leonardoguilhoto avatar Nov 26 '25 18:11 leonardoguilhoto