What happened?

Hello, botorch developers.

I am using botorch to tackle multi-objective optimization problems with parameter constraints and outcome constraints. After upgrading from botorch 0.14 to 0.16.1, I noticed that the time spent inside optimize_acqf, specifically in batch_initial_conditions, increased by roughly a factor of 1.5.

By timing each commit using the attached script, I found that the slowdown is introduced by the changes in #2920. I understand that this commit is intended to address an OOM issue by improving memory usage and execution efficiency, but I am not sure why it results in slower performance for my cases. However, when outcome constraints are present, I observe a similar slowdown not only in the problem described in the attached script but also in several other problems, so it does not seem to depend on a specific optimization setup.

Please provide a minimal, reproducible example of the unexpected behavior.

from time import time

import numpy as np
import torch

from gpytorch.mlls import SumMarginalLogLikelihood
import botorch
from botorch.models import SingleTaskGP, ModelListGP
from botorch.utils.transforms import normalize, standardize
from botorch.fit import fit_gpytorch_mll

from botorch.acquisition.multi_objective import qLogExpectedHypervolumeImprovement
from botorch.acquisition.multi_objective import IdentityMCMultiOutputObjective
from botorch.utils.multi_objective.box_decompositions import NondominatedPartitioning
from botorch.sampling import SobolQMCNormalSampler

from botorch.optim import optimize_acqf
from botorch.optim.initializers import gen_batch_initial_conditions


# gen_batch_initial_conditions with time check
def time_check_gen_batch_initial_conditions(*args, **kwargs):
    start = time()
    out = gen_batch_initial_conditions(*args, **kwargs)
    end = time()
    print(f'  gen_batch_initial_conditions: {end  - start :.2f} sec.')
    return out


# Multi objective function
def objectives(X: torch.Tensor):
    return X


# function for outcome constraint
def constraint_violations(X: torch.Tensor):
    """x + y <= 10"""
    x = X[:, 0].unsqueeze(-1)
    y = X[:, 1].unsqueeze(-1)
    return x + y - 10.


# function for monte_carlo constraint
def cv_func_monte_carlo(Z, idx):
    return Z[..., idx]


def main():
    print(f'botorch version: {botorch.__version__}')

    for seed in range(3):
        np.random.seed(seed)
        with botorch.manual_seed(seed):

            # initial data
            N = 50
            bounds = np.array(((0, 10), (0, 10)), dtype=float)
            x = np.concatenate(
                [
                    np.random.rand(N, 1) * (bounds[i][1] - bounds[i][0]) + bounds[i][0]
                    for i in range(len(bounds))
                ], axis=-1
            )

            # make tensor
            B = torch.tensor(bounds).transpose(0, 1)
            X = torch.tensor(x)

            # calc feasibility
            CV = constraint_violations(X)
            feas_idx = (CV <= 0).all(dim=-1)

            # calc objectives
            feas_X = X[feas_idx]
            feas_Y = objectives(feas_X)

            # print
            print(f'  {len(feas_X)=}')

            # detach, transform
            feas_X = normalize(feas_X.detach().clone(), bounds=B)
            feas_Y = standardize(feas_Y.detach().clone())
            X = normalize(X.detach().clone(), bounds=B)
            CV = standardize(CV.detach().clone())

            # train objective model
            model = SingleTaskGP(
                train_X=feas_X,
                train_Y=feas_Y,
                input_transform=None,
                outcome_transform=None,
            )

            # train constraint model
            model_con = SingleTaskGP(
                train_X=X.detach().clone(),
                train_Y=CV.detach().clone(),
                input_transform=None,
                outcome_transform=None,
            )

            # integrate them
            models = ModelListGP(model, model_con)
            mll = SumMarginalLogLikelihood(likelihood=models.likelihood, model=models)
            fit_gpytorch_mll(mll)

            # ACQF setup
            objective = IdentityMCMultiOutputObjective(
                outcomes=list(range(feas_Y.size(-1))),
            )
            constraints = [
                lambda Z: cv_func_monte_carlo(Z, i + feas_Y.size(-1))
                for i in range(CV.size(-1))
            ]
            sampler = SobolQMCNormalSampler(
                sample_shape=torch.Size((256,))
            )
            ref_point = feas_Y.min(dim=0).values.detach().clone()
            ref_point -= 1e-8
            alpha = 0.0
            partitioning = NondominatedPartitioning(
                ref_point=ref_point,
                Y=feas_Y,
                alpha=alpha,
            )
            acqf = qLogExpectedHypervolumeImprovement(
                model=models,
                ref_point=ref_point.tolist(),
                partitioning=partitioning,
                sampler=sampler,
                objective=objective,
                constraints=constraints,
            )

            # optimize acqf with
            options = {
                "batch_limit": 1,  # requires for non-linear inequality constraints
                "maxiter": 200,
            }
            start = time()
            candidate, acqf_value = optimize_acqf(
                acq_function=acqf,
                bounds=B,
                q=1,
                num_restarts=20,
                raw_samples=1024,
                options=options,
                return_best_only=True,
                sequential=True,
                # nonlinear_inequality_constraints=...,
                ic_generator=time_check_gen_batch_initial_conditions,
            )
            end = time()

            print(f'  Elapsed time: {end - start: .2f} sec.')



if __name__ == '__main__':
    main()

Execution summary of `gen_batch_initial_conditions` time

batch_limit = 1

botorch version	mean ± stddev (sec)
0.14.0	7.87 ± 0.53
0.14.1.dev41	7.74 ± 0.53
0.14.1.dev42	12.59 ± 1.38
0.16.1	11.51 ± 0.56

batch_limit = 5

botorch version	mean ± stddev (sec)
0.14.0	2.92 ± 0.36
0.14.1.dev41	2.98 ± 0.50
0.14.1.dev42	3.92 ± 0.79
0.16.1	3.92 ± 0.81

all data

batch_limit: 1

botorch version: 0.14.0 len(feas_X)=29 gen_batch_initial_conditions: 8.33 sec. Elapsed time: 12.07 sec. len(feas_X)=25 gen_batch_initial_conditions: 7.29 sec. Elapsed time: 10.95 sec. len(feas_X)=28 gen_batch_initial_conditions: 7.99 sec. Elapsed time: 12.36 sec.

(commit: 44299a143fc1efe6f087d6684c724e9e71cc1cdd) botorch version: 0.14.1.dev41+g44299a143 len(feas_X)=29 gen_batch_initial_conditions: 8.28 sec. Elapsed time: 11.97 sec. len(feas_X)=25 gen_batch_initial_conditions: 7.23 sec. Elapsed time: 11.00 sec. len(feas_X)=28 gen_batch_initial_conditions: 7.72 sec. Elapsed time: 12.30 sec.

(commit: 71f03ea20abf219034f01db0db528ad7a74167aa)
botorch version: 0.14.1.dev42+g71f03ea20 len(feas_X)=29 gen_batch_initial_conditions: 14.11 sec. Elapsed time: 19.73 sec. len(feas_X)=25 gen_batch_initial_conditions: 12.22 sec. Elapsed time: 17.29 sec. len(feas_X)=28 gen_batch_initial_conditions: 11.43 sec. Elapsed time: 17.30 sec.

botorch version: 0.16.1 len(feas_X)=29 gen_batch_initial_conditions: 12.04 sec. Elapsed time: 16.83 sec. len(feas_X)=25 gen_batch_initial_conditions: 10.93 sec. Elapsed time: 15.95 sec. len(feas_X)=28 gen_batch_initial_conditions: 11.56 sec. Elapsed time: 17.43 sec.

batch_limit: 5

botorch version: 0.14.0 len(feas_X)=29 gen_batch_initial_conditions: 3.31 sec. Elapsed time: 9.65 sec. len(feas_X)=25 gen_batch_initial_conditions: 2.60 sec. Elapsed time: 7.43 sec. len(feas_X)=28 gen_batch_initial_conditions: 2.85 sec. Elapsed time: 9.11 sec.

(commit: 44299a143fc1efe6f087d6684c724e9e71cc1cdd) botorch version: 0.14.1.dev41+g44299a143 len(feas_X)=29 gen_batch_initial_conditions: 3.48 sec. Elapsed time: 5.41 sec. len(feas_X)=25 gen_batch_initial_conditions: 2.48 sec. Elapsed time: 4.30 sec. len(feas_X)=28 gen_batch_initial_conditions: 2.99 sec. Elapsed time: 5.48 sec.

(commit: 71f03ea20abf219034f01db0db528ad7a74167aa)
botorch version: 0.14.1.dev42+g71f03ea20 len(feas_X)=29 gen_batch_initial_conditions: 4.79 sec. Elapsed time: 7.28 sec. len(feas_X)=25 gen_batch_initial_conditions: 3.26 sec. Elapsed time: 5.46 sec. len(feas_X)=28 gen_batch_initial_conditions: 3.71 sec. Elapsed time: 6.85 sec.

botorch version: 0.16.1 len(feas_X)=29 gen_batch_initial_conditions: 4.82 sec. Elapsed time: 7.34 sec. len(feas_X)=25 gen_batch_initial_conditions: 3.25 sec. Elapsed time: 5.48 sec. len(feas_X)=28 gen_batch_initial_conditions: 3.68 sec. Elapsed time: 6.85 sec.

Please paste any relevant traceback/logs produced by the example provided.

BoTorch Version

0.14.0, 0.16.1

Python Version

3.13.7

Operating System

Windows 11

(Optional) Describe any potential fixes you've considered to the issue outlined above.

I’m not sure what kind of improvement would be appropriate, but one idea that comes to mind is to provide an option to use the previous implementation.

Thank you very much for your time and support.

Pull Request

None

Code of Conduct

[x] I agree to follow BoTorch's Code of Conduct

Nov 30 '25 17:11 ScatterTemple

I am not sure why it results in slower performance for my cases

I think to get to the bottom of this we'd probably want to profile this in detail with cProfile to understand where the time is being spent. At a high level, if there is a memory vs. compute tradeoff here then I wouldn't be surprised to see some degree of slowdown, but we'd really have to profile this in detail.

However, when outcome constraints are present, I observe a similar slowdown not only in the problem described in the attached script but also in several other problems, so it does not seem to depend on a specific optimization setup.

Can you clarify what exactly causes the slowdown? Just the fact that outcome constraints are present? I would expect that to lead to slowdowns since presumably the model has more outcomes to predict and the acquisition function has more work to do. Or are you saying that the change in https://github.com/meta-pytorch/botorch/pull/2920 affects speed in the presence of outcome constraints in particular?

Nov 30 '25 21:11 Balandat

Thank you very much for your response.

To clarify, I fully understand that within the same BoTorch version, having outcome constraints will naturally make the optimization slower than not having them. I do not consider that to be an issue.

What I want to report is the difference in runtime before and after #2920 (between 44299a143fc1efe6f087d6684c724e9e71cc1cdd and 71f03ea20abf219034f01db0db528ad7a74167aa) when outcome constraints are present. My original post may have included too much information and become harder to read - my apologies if the main point was unclear.

As suggested, I will try profiling with cProfile next. If there are any specific functions or parts of the codebase that would be particularly helpful to inspect, I would greatly appreciate your guidance.

Dec 01 '25 11:12 ScatterTemple

As suggested, I will try profiling with cProfile next. If there are any specific functions or parts of the codebase that would be particularly helpful to inspect, I would greatly appreciate your guidance.

I think simply profiling gen_batch_initial_conditions with cProfile and comparing the traces should point us to the right place.

Dec 01 '25 15:12 Balandat

Thank you very much for your time.

I made some adjustments to the earlier code and ran cProfile on gen_batch_initial_conditions for both 0.14.1.dev41 (before #2920) and 0.14.1.dev42 (after #2920). Since .prof files cannot be pasted directly on GitHub, I have attached the results as .txt files instead. This is my first time using cProfile, so please let me know if any additional information or a different format would be more helpful.

I also performed a simple measurement of memory usage. The results are short, so I am including them below.

0.14.1.dev41 (before #2920)

botorch version: 0.14.1.dev41+g44299a143
  len(feas_X)=29
Filename: e:\constrained-botorch-sampler\benchmarks\botorch_gen_batch_initial_condition_speed_compare.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   156    305.7 MiB    305.7 MiB           1   @profile
   157                                         def time_check_gen_batch_initial_conditions(*args, **kwargs):
   158
   164    311.6 MiB      5.9 MiB           1       out = gen_batch_initial_conditions(*args, **kwargs)
   169    311.6 MiB      0.0 MiB           1       return out

  Elapsed time:  27.53 sec.

0.14.1.dev42 (after #2920)

botorch version: 0.14.1.dev42+g71f03ea20
  len(feas_X)=29
Filename: e:\constrained-botorch-sampler\benchmarks\botorch_gen_batch_initial_condition_speed_compare.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   156    306.1 MiB    306.1 MiB           1   @profile
   157                                         def time_check_gen_batch_initial_conditions(*args, **kwargs):
   158
   164    312.7 MiB      6.6 MiB           1       out = gen_batch_initial_conditions(*args, **kwargs)
   169    312.7 MiB      0.0 MiB           1       return out

  Elapsed time:  46.78 sec.

If there is any missing information or if certain parts would benefit from a more focused profile, I would be happy to run additional measurements.
Thank you again for your support.

cProfile_result_0.14.1.dev41.txt

cProfile_result_0.14.1.dev42.txt

Dec 02 '25 14:12 ScatterTemple

Since .prof files cannot be pasted directly on GitHub, I have attached the results as .txt files instead. This is my first time using cProfile, so please let me know if any additional information or a different format would be more helpful.

Could you post the .prof files simply by appending a .txt suffix manually? Or share them in some other way (e.g. a google drive link)? That way we can look at the results through some visualization tools (e.g. snakeviz: https://jiffyclub.github.io/snakeviz/) that makes this a lot easier to interpret.

Dec 02 '25 14:12 Balandat

Ah, that makes sense. Here are the .prof files with the extensions changed as suggested.

cProfile_result-41.prof.txt cProfile_result-42.prof.txt

Dec 02 '25 14:12 ScatterTemple

I looked into this a little bit; the main thing that pops out to me is that the dev42 version spends a lot more time generating many more CatLinearOperator objects (22528 calls vs. 6144 for dev41). Presumably that is due to extracting the covariance submatrix for each task individually, rather than using the slicing operation on the linear operator as before.

I'd have to dig into this some more to understand what exactly is going on though. Do I read your code correctly that this example uses a model with 10 outcomes as objectives plus another constraint outcome?

Dec 15 '25 05:12 Balandat

Thank you very much for looking into this.

In this example, the objectives are two-dimensional (feas_Y), and there is an additional one-dimensional constraint (CV). We prepare one SingleTaskGP for the 2D objectives (feas_Y as a whole) and another SingleTaskGP for the 1D constraint (CV), and then combine these two models using ModelListGP and train it.

The value 10 simply represents the upper bound of the input. I chose a value other than 1 only to test whether normalization works correctly.

Dec 15 '25 09:12 ScatterTemple

Slowdown in optimize_acqf with outcome constraints after upgrade to 0.16.1

What happened?

Please provide a minimal, reproducible example of the unexpected behavior.

Execution summary of gen_batch_initial_conditions time

batch_limit = 1

batch_limit = 5

all data

batch_limit: 1

batch_limit: 5

Please paste any relevant traceback/logs produced by the example provided.

BoTorch Version

Python Version

Operating System

(Optional) Describe any potential fixes you've considered to the issue outlined above.

Pull Request

Code of Conduct

0.14.1.dev41 (before #2920)

0.14.1.dev42 (after #2920)

Execution summary of `gen_batch_initial_conditions` time