botorch icon indicating copy to clipboard operation
botorch copied to clipboard

[Feature Request] Add an option to specify gen_candidates method in joint_optimize

Open Tenoke opened this issue 6 years ago • 8 comments

🚀 Feature Request

Add an argument in joint_optimize to specify gen_candidates method - so you can easily use e.g. fit_gpytorch_torch instead of fit_gpytorch_scipy.

Motivation

It often makes sense to use a faster candidate generation method than the default, which is gen_candidates_scipy. I find myself needing to use gen_candidates_torch instead for bigger models (especially when dealing with higher dimensional data).

This would match other parts of the API - for example fit_gpytorch_model where you can easily pass optimizer=fit_gpytorch_torch instead of the default fit_gpytorch_scipy.

This would match other parts of the API - for example fit_gpytorch_model where you can easily pass optimizer=fit_gpytorch_torch instead of the default fit_gpytorch_scipy.

Pitch

Something like

def joint_optimize(
    acq_function: AcquisitionFunction,
    bounds: Tensor,
    q: int,
    num_restarts: int,
    raw_samples: int,
    options: Optional[Dict[str, Union[bool, float, int]]] = None,
    inequality_constraints: Optional[List[Tuple[Tensor, Tensor, float]]] = None,
    equality_constraints: Optional[List[Tuple[Tensor, Tensor, float]]] = None,
    fixed_features: Optional[Dict[int, float]] = None,
    post_processing_func: Optional[Callable[[Tensor], Tensor]] = None,
    gen_candidates: Optional[Callable[[Tensor], Tensor]] = gen_candidates_scipy
) -> Tensor:
...

If the feature is accepted, I can submit a PR for it.

Tenoke avatar Jul 15 '19 15:07 Tenoke

That seems reasonable to me in general.

One thing to note is that gen_candidates_torch in its current form is quite basic and may not work very well compared to gen_candidates_scipy. gen_candidates_scipy is really geared towards using deterministic optimization together with fixed base samples (see discussion here), which means that it's essentially hyperparameter-free, provides faster convergence rates and well-specified convergence criteria.

gen_candidates_torch calls a check_convergence function, which at this point doesn't do more than checking the iteration count. Some more thought will have to go into this to make this universally applicable. The default Adam optimizer also wouldn't actually require the objective to be deterministic, so you could use a different sampler with resample=True in the acquisition function (for MC-based acquisition functions). So doing torch generation properly does require quite a bit more understanding of some of the internals, which means we should make sure that users have that understanding.

Can I ask how (with which settings) you are using gen_candidates_torch on what kinds of problems? Did you compare the performance of gen_candidates_torch against gen_candidates_scipy? Just want to make sure you're not ending up with inferior solutions .

Balandat avatar Jul 15 '19 15:07 Balandat

gen_candidates_scipy always works better, of course, but if I'm working with (most recently) a problem in 100+ dimensions. Scipywill either run out of memory or take more days to complete than makes sense for my use case, as you'd expect. Even just switching and doing nothing else ( i do specify the optimizer though) works quite well (or to be more precise it works okay + is very fast). I haven't found the need to do more in this case (although I will likely need to go back to it later and improve performance).

gen_candidates_torch(
            initial_conditions=batch_initial_conditions[start_idx:end_idx],
            optimizer=torch.optim.SGD,
            acquisition_function=acq_function,
            lower_bounds=bounds[0],
            upper_bounds=bounds[1],
            options={
                k: v
                for k, v in options.items()
                if k not in ("batch_limit", "nonnegative")
            },
            fixed_features=fixed_features,
        )

I'm using it for UCB with a custom GP in higher dimensions on a black box problem with few assumptions (other than bounds).

The only other thing I've ended up doing is increasing the iterations in check_convergence but for my problems, even the default seems to get close enough. Of course, a more involved check_convergence would be great.

At any rate, improvement and understanding of the details of gen_candidates_torch would be nice but when you simply don't have the time/memory for scipy, just replacing them is a big improvement as you can now run your model. That doesn't change that the default should continue to be scipy, as it's always going to be better when you can use it.

Tenoke avatar Jul 15 '19 16:07 Tenoke

a problem in 100+ dimensions

I'm curious, are you using any dimensionality reduction techniques here (DKL, random projections...)? Seems like a vanilla GP won't work very well in this high a dimension. How many observations are you working with?

scipy will either run out of memory or take more days to complete than makes sense for my use case, as you'd expect.

Hmm it's somewhat surprising that scipy would run out of memory, this seems like it would be an issue for the model and not the optimizer since the function that is evaluated is the same in either case. Since my recent PR #207, by default gen_candidates_scipy should use L-BFGS-B instead of SLSQP, which scales quite well in the dimension of the problem. I have "solved" some research-level problems in 25000 dimensions with this and it worked quite well. Maybe try gen_candidates_scipy for your problem based on the most recent master?

Balandat avatar Jul 15 '19 16:07 Balandat

You are right since #207 it is hard to contrive an example where gen_candidates_scipy performs worse/slower/whatever!

It seems to be about the same now, and I don't need to manually specify the maxiter anymore. It might still be worth it to add in the option to switch to gen_candidates_torch + a change in check_convergence to keep going until the loss is stable instead of fixed 50 iterations (and some higher maximum), but doesn't seem too necessary.

I'm curious, are you using any dimensionality reduction techniques here (DKL, random projections...)? Seems like a vanilla GP won't work very well in this high a dimension.

I usually use a small NN for it.

if you are interested, here's a very quickly put together a comparison of gen_candidates_scipy and gen_candidates_torch - they mostly perform equally well here but this isn't exactly a real-world example.

Tenoke avatar Jul 16 '19 12:07 Tenoke

You are right since #207 it is hard to contrive an example where gen_candidates_scipy performs worse/slower/whatever!

Great.

Thanks also for the example. One thing to note is that the function you model is quite simple, so the UCB response surface won't be overly complicated (UCB usually isn't, it's more with acquisition functions like EI that is often zero in large parts of the space). I would suspect that in higher-dimensional more complex problems the scipy version will do a better job.

Note also that the initialization heuristic in gen_batch_initial_conditions is quite important to select the initial conditions. If you want to improve your solution with relatively little computational overhead you can increase the raw_samples - that exploits the fast batch evaluations of the acquisition function and so scales pretty well.

It might still be worth it to add in the option to switch to gen_candidates_torch + a change in check_convergence to keep going until the loss is stable instead of fixed 50 iterations.

Yeah I think I'd want to think a little more thoroughly about convergence criteria and default options before adding the option in. We just haven't had the need for it so far. I'll leave this issue open for that purpose though.

Balandat avatar Jul 16 '19 14:07 Balandat

wrong reference in the PR above, sorry for the confusion

Balandat avatar Aug 07 '19 16:08 Balandat

Scipy will either run out of memory or take more days to complete than makes sense for my use case, as you'd expect.

it's somewhat surprising that scipy would run out of memory,

FWIW, gen_candidates_scipy will run out of memory if method=='SLSQP' and the initial_conditions is a large-dim tensor. SLSQP also gets extremely slow as the size of initial_conditions increases (before running into the memory issues). With the current defaults, this should only be an issue when optimizing a one-shot acquisition function with constraints. Figured I'd note it here since the issue is still open.

E.g., in the following initial_conditions has 30000 elements, and SLSQP tries to allocate 128gb memory (presumably for the hessian). I looked into it and this happens in the scipy call to the SLSQP Fortran engine.

import torch
from botorch.models import SingleTaskGP
from botorch.acquisition import qKnowledgeGradient
from botorch import gen_candidates_scipy


dim = 6
model = SingleTaskGP(torch.rand(50, dim), torch.rand(50, 1))
qkg = qKnowledgeGradient(model=model)
initial_conditions = torch.rand(100, 75, dim)

gen_candidates_scipy(
    initial_conditions=initial_conditions,
    acquisition_function=qkg,
    lower_bounds=torch.zeros(dim),
    upper_bounds=torch.ones(dim),
    # inequality_constraints=[(torch.tensor([0, 1]), torch.tensor([-1.0, -1.0]), -1.0)],
    options={"method": "SLSQP", "maxiter": 10},
)

saitcakmak avatar Nov 19 '20 19:11 saitcakmak

In light of this, it would be really nice to have (i) forward differentiation in pytorch (https://github.com/pytorch/pytorch/issues/10223) so we can efficiently implement Hessian computations (ii) the ability to directly return the Hessian in sparse format so solvers that support sparse Hessians can make use of it

Balandat avatar Nov 19 '20 21:11 Balandat

Closing as stale (joint_optimize is deprecated).

esantorella avatar Jan 30 '23 19:01 esantorella