Slowdown in optimize_acqf with outcome constraints after upgrade to 0.16.1
What happened?
Hello, botorch developers.
I am using botorch to tackle multi-objective optimization problems with parameter constraints and outcome constraints.
After upgrading from botorch 0.14 to 0.16.1, I noticed that the time spent inside optimize_acqf, specifically in batch_initial_conditions, increased by roughly a factor of 1.5.
By timing each commit using the attached script, I found that the slowdown is introduced by the changes in #2920. I understand that this commit is intended to address an OOM issue by improving memory usage and execution efficiency, but I am not sure why it results in slower performance for my cases. However, when outcome constraints are present, I observe a similar slowdown not only in the problem described in the attached script but also in several other problems, so it does not seem to depend on a specific optimization setup.
Please provide a minimal, reproducible example of the unexpected behavior.
from time import time
import numpy as np
import torch
from gpytorch.mlls import SumMarginalLogLikelihood
import botorch
from botorch.models import SingleTaskGP, ModelListGP
from botorch.utils.transforms import normalize, standardize
from botorch.fit import fit_gpytorch_mll
from botorch.acquisition.multi_objective import qLogExpectedHypervolumeImprovement
from botorch.acquisition.multi_objective import IdentityMCMultiOutputObjective
from botorch.utils.multi_objective.box_decompositions import NondominatedPartitioning
from botorch.sampling import SobolQMCNormalSampler
from botorch.optim import optimize_acqf
from botorch.optim.initializers import gen_batch_initial_conditions
# gen_batch_initial_conditions with time check
def time_check_gen_batch_initial_conditions(*args, **kwargs):
start = time()
out = gen_batch_initial_conditions(*args, **kwargs)
end = time()
print(f' gen_batch_initial_conditions: {end - start :.2f} sec.')
return out
# Multi objective function
def objectives(X: torch.Tensor):
return X
# function for outcome constraint
def constraint_violations(X: torch.Tensor):
"""x + y <= 10"""
x = X[:, 0].unsqueeze(-1)
y = X[:, 1].unsqueeze(-1)
return x + y - 10.
# function for monte_carlo constraint
def cv_func_monte_carlo(Z, idx):
return Z[..., idx]
def main():
print(f'botorch version: {botorch.__version__}')
for seed in range(3):
np.random.seed(seed)
with botorch.manual_seed(seed):
# initial data
N = 50
bounds = np.array(((0, 10), (0, 10)), dtype=float)
x = np.concatenate(
[
np.random.rand(N, 1) * (bounds[i][1] - bounds[i][0]) + bounds[i][0]
for i in range(len(bounds))
], axis=-1
)
# make tensor
B = torch.tensor(bounds).transpose(0, 1)
X = torch.tensor(x)
# calc feasibility
CV = constraint_violations(X)
feas_idx = (CV <= 0).all(dim=-1)
# calc objectives
feas_X = X[feas_idx]
feas_Y = objectives(feas_X)
# print
print(f' {len(feas_X)=}')
# detach, transform
feas_X = normalize(feas_X.detach().clone(), bounds=B)
feas_Y = standardize(feas_Y.detach().clone())
X = normalize(X.detach().clone(), bounds=B)
CV = standardize(CV.detach().clone())
# train objective model
model = SingleTaskGP(
train_X=feas_X,
train_Y=feas_Y,
input_transform=None,
outcome_transform=None,
)
# train constraint model
model_con = SingleTaskGP(
train_X=X.detach().clone(),
train_Y=CV.detach().clone(),
input_transform=None,
outcome_transform=None,
)
# integrate them
models = ModelListGP(model, model_con)
mll = SumMarginalLogLikelihood(likelihood=models.likelihood, model=models)
fit_gpytorch_mll(mll)
# ACQF setup
objective = IdentityMCMultiOutputObjective(
outcomes=list(range(feas_Y.size(-1))),
)
constraints = [
lambda Z: cv_func_monte_carlo(Z, i + feas_Y.size(-1))
for i in range(CV.size(-1))
]
sampler = SobolQMCNormalSampler(
sample_shape=torch.Size((256,))
)
ref_point = feas_Y.min(dim=0).values.detach().clone()
ref_point -= 1e-8
alpha = 0.0
partitioning = NondominatedPartitioning(
ref_point=ref_point,
Y=feas_Y,
alpha=alpha,
)
acqf = qLogExpectedHypervolumeImprovement(
model=models,
ref_point=ref_point.tolist(),
partitioning=partitioning,
sampler=sampler,
objective=objective,
constraints=constraints,
)
# optimize acqf with
options = {
"batch_limit": 1, # requires for non-linear inequality constraints
"maxiter": 200,
}
start = time()
candidate, acqf_value = optimize_acqf(
acq_function=acqf,
bounds=B,
q=1,
num_restarts=20,
raw_samples=1024,
options=options,
return_best_only=True,
sequential=True,
# nonlinear_inequality_constraints=...,
ic_generator=time_check_gen_batch_initial_conditions,
)
end = time()
print(f' Elapsed time: {end - start: .2f} sec.')
if __name__ == '__main__':
main()
Execution summary of gen_batch_initial_conditions time
batch_limit = 1
| botorch version | mean ± stddev (sec) |
|---|---|
| 0.14.0 | 7.87 ± 0.53 |
| 0.14.1.dev41 | 7.74 ± 0.53 |
| 0.14.1.dev42 | 12.59 ± 1.38 |
| 0.16.1 | 11.51 ± 0.56 |
batch_limit = 5
| botorch version | mean ± stddev (sec) |
|---|---|
| 0.14.0 | 2.92 ± 0.36 |
| 0.14.1.dev41 | 2.98 ± 0.50 |
| 0.14.1.dev42 | 3.92 ± 0.79 |
| 0.16.1 | 3.92 ± 0.81 |
all data
batch_limit: 1
botorch version: 0.14.0 len(feas_X)=29 gen_batch_initial_conditions: 8.33 sec. Elapsed time: 12.07 sec. len(feas_X)=25 gen_batch_initial_conditions: 7.29 sec. Elapsed time: 10.95 sec. len(feas_X)=28 gen_batch_initial_conditions: 7.99 sec. Elapsed time: 12.36 sec.
(commit: 44299a143fc1efe6f087d6684c724e9e71cc1cdd) botorch version: 0.14.1.dev41+g44299a143 len(feas_X)=29 gen_batch_initial_conditions: 8.28 sec. Elapsed time: 11.97 sec. len(feas_X)=25 gen_batch_initial_conditions: 7.23 sec. Elapsed time: 11.00 sec. len(feas_X)=28 gen_batch_initial_conditions: 7.72 sec. Elapsed time: 12.30 sec.
(commit: 71f03ea20abf219034f01db0db528ad7a74167aa)
botorch version: 0.14.1.dev42+g71f03ea20
len(feas_X)=29
gen_batch_initial_conditions: 14.11 sec.
Elapsed time: 19.73 sec.
len(feas_X)=25
gen_batch_initial_conditions: 12.22 sec.
Elapsed time: 17.29 sec.
len(feas_X)=28
gen_batch_initial_conditions: 11.43 sec.
Elapsed time: 17.30 sec.
botorch version: 0.16.1 len(feas_X)=29 gen_batch_initial_conditions: 12.04 sec. Elapsed time: 16.83 sec. len(feas_X)=25 gen_batch_initial_conditions: 10.93 sec. Elapsed time: 15.95 sec. len(feas_X)=28 gen_batch_initial_conditions: 11.56 sec. Elapsed time: 17.43 sec.
batch_limit: 5
botorch version: 0.14.0 len(feas_X)=29 gen_batch_initial_conditions: 3.31 sec. Elapsed time: 9.65 sec. len(feas_X)=25 gen_batch_initial_conditions: 2.60 sec. Elapsed time: 7.43 sec. len(feas_X)=28 gen_batch_initial_conditions: 2.85 sec. Elapsed time: 9.11 sec.
(commit: 44299a143fc1efe6f087d6684c724e9e71cc1cdd) botorch version: 0.14.1.dev41+g44299a143 len(feas_X)=29 gen_batch_initial_conditions: 3.48 sec. Elapsed time: 5.41 sec. len(feas_X)=25 gen_batch_initial_conditions: 2.48 sec. Elapsed time: 4.30 sec. len(feas_X)=28 gen_batch_initial_conditions: 2.99 sec. Elapsed time: 5.48 sec.
(commit: 71f03ea20abf219034f01db0db528ad7a74167aa)
botorch version: 0.14.1.dev42+g71f03ea20
len(feas_X)=29
gen_batch_initial_conditions: 4.79 sec.
Elapsed time: 7.28 sec.
len(feas_X)=25
gen_batch_initial_conditions: 3.26 sec.
Elapsed time: 5.46 sec.
len(feas_X)=28
gen_batch_initial_conditions: 3.71 sec.
Elapsed time: 6.85 sec.
botorch version: 0.16.1 len(feas_X)=29 gen_batch_initial_conditions: 4.82 sec. Elapsed time: 7.34 sec. len(feas_X)=25 gen_batch_initial_conditions: 3.25 sec. Elapsed time: 5.48 sec. len(feas_X)=28 gen_batch_initial_conditions: 3.68 sec. Elapsed time: 6.85 sec.
Please paste any relevant traceback/logs produced by the example provided.
BoTorch Version
0.14.0, 0.16.1
Python Version
3.13.7
Operating System
Windows 11
(Optional) Describe any potential fixes you've considered to the issue outlined above.
I’m not sure what kind of improvement would be appropriate, but one idea that comes to mind is to provide an option to use the previous implementation.
Thank you very much for your time and support.
Pull Request
None
Code of Conduct
- [x] I agree to follow BoTorch's Code of Conduct
I am not sure why it results in slower performance for my cases
I think to get to the bottom of this we'd probably want to profile this in detail with cProfile to understand where the time is being spent. At a high level, if there is a memory vs. compute tradeoff here then I wouldn't be surprised to see some degree of slowdown, but we'd really have to profile this in detail.
However, when outcome constraints are present, I observe a similar slowdown not only in the problem described in the attached script but also in several other problems, so it does not seem to depend on a specific optimization setup.
Can you clarify what exactly causes the slowdown? Just the fact that outcome constraints are present? I would expect that to lead to slowdowns since presumably the model has more outcomes to predict and the acquisition function has more work to do. Or are you saying that the change in https://github.com/meta-pytorch/botorch/pull/2920 affects speed in the presence of outcome constraints in particular?
Thank you very much for your response.
To clarify, I fully understand that within the same BoTorch version, having outcome constraints will naturally make the optimization slower than not having them. I do not consider that to be an issue.
What I want to report is the difference in runtime before and after #2920 (between 44299a143fc1efe6f087d6684c724e9e71cc1cdd and 71f03ea20abf219034f01db0db528ad7a74167aa) when outcome constraints are present.
My original post may have included too much information and become harder to read - my apologies if the main point was unclear.
As suggested, I will try profiling with cProfile next. If there are any specific functions or parts of the codebase that would be particularly helpful to inspect, I would greatly appreciate your guidance.
As suggested, I will try profiling with cProfile next. If there are any specific functions or parts of the codebase that would be particularly helpful to inspect, I would greatly appreciate your guidance.
I think simply profiling gen_batch_initial_conditions with cProfile and comparing the traces should point us to the right place.
Thank you very much for your time.
I made some adjustments to the earlier code and ran cProfile on gen_batch_initial_conditions for both 0.14.1.dev41 (before #2920) and 0.14.1.dev42 (after #2920). Since .prof files cannot be pasted directly on GitHub, I have attached the results as .txt files instead. This is my first time using cProfile, so please let me know if any additional information or a different format would be more helpful.
I also performed a simple measurement of memory usage. The results are short, so I am including them below.
0.14.1.dev41 (before #2920)
botorch version: 0.14.1.dev41+g44299a143
len(feas_X)=29
Filename: e:\constrained-botorch-sampler\benchmarks\botorch_gen_batch_initial_condition_speed_compare.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
156 305.7 MiB 305.7 MiB 1 @profile
157 def time_check_gen_batch_initial_conditions(*args, **kwargs):
158
164 311.6 MiB 5.9 MiB 1 out = gen_batch_initial_conditions(*args, **kwargs)
169 311.6 MiB 0.0 MiB 1 return out
Elapsed time: 27.53 sec.
0.14.1.dev42 (after #2920)
botorch version: 0.14.1.dev42+g71f03ea20
len(feas_X)=29
Filename: e:\constrained-botorch-sampler\benchmarks\botorch_gen_batch_initial_condition_speed_compare.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
156 306.1 MiB 306.1 MiB 1 @profile
157 def time_check_gen_batch_initial_conditions(*args, **kwargs):
158
164 312.7 MiB 6.6 MiB 1 out = gen_batch_initial_conditions(*args, **kwargs)
169 312.7 MiB 0.0 MiB 1 return out
Elapsed time: 46.78 sec.
If there is any missing information or if certain parts would benefit from a more focused profile, I would be happy to run additional measurements.
Thank you again for your support.
Since .prof files cannot be pasted directly on GitHub, I have attached the results as .txt files instead. This is my first time using cProfile, so please let me know if any additional information or a different format would be more helpful.
Could you post the .prof files simply by appending a .txt suffix manually? Or share them in some other way (e.g. a google drive link)? That way we can look at the results through some visualization tools (e.g. snakeviz: https://jiffyclub.github.io/snakeviz/) that makes this a lot easier to interpret.
Ah, that makes sense. Here are the .prof files with the extensions changed as suggested.
I looked into this a little bit; the main thing that pops out to me is that the dev42 version spends a lot more time generating many more CatLinearOperator objects (22528 calls vs. 6144 for dev41). Presumably that is due to extracting the covariance submatrix for each task individually, rather than using the slicing operation on the linear operator as before.
I'd have to dig into this some more to understand what exactly is going on though. Do I read your code correctly that this example uses a model with 10 outcomes as objectives plus another constraint outcome?
Thank you very much for looking into this.
In this example, the objectives are two-dimensional (feas_Y), and there is an additional one-dimensional constraint (CV). We prepare one SingleTaskGP for the 2D objectives (feas_Y as a whole) and another SingleTaskGP for the 1D constraint (CV), and then combine these two models using ModelListGP and train it.
The value 10 simply represents the upper bound of the input. I chose a value other than 1 only to test whether normalization works correctly.