botorch icon indicating copy to clipboard operation
botorch copied to clipboard

Buxfix for Proximal acquisition function wrapper for negative base acquisition functions

Open roussel-ryan opened this issue 3 years ago • 7 comments

Motivation

This PR fixes a major issue when using the ProximalAcquisitionFunction with base acquisition functions that are not strictly positive. This PR fixes it by applying a Softplus transformation to the base acquisition function values (using optional beta = 1,0 argument) before multiplying by proximal weighting.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Tests have been updated with correct (softplus transformed) values.

roussel-ryan avatar Oct 10 '22 19:10 roussel-ryan

Codecov Report

Merging #1447 (5a6ceaf) into main (0277720) will not change coverage. The diff coverage is 100.00%.

:exclamation: Current head 5a6ceaf differs from pull request most recent head 213be32. Consider uploading reports for the commit 213be32 to get more accurate results

@@            Coverage Diff            @@
##              main     #1447   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          124       124           
  Lines        11548     11554    +6     
=========================================
+ Hits         11548     11554    +6     
Impacted Files Coverage Δ
botorch/acquisition/proximal.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Oct 10 '22 19:10 codecov[bot]

@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Oct 10 '22 19:10 facebook-github-bot

This is a higher level design question, rather than a comment on the particular fix.

@roussel-ryan, is there a reason against simply adding a quadratic to an arbitrary acquisition function? With this fix, ProximalAcquisitionFunction will compute

log(1+exp(acquisition(x))) * exp(-Q(x)),

where Q(x) = (x-x_{n-1}).t() * proximal_weights^{-1} * (x-x_{n-1})). Similar biasing could be achieved with the simpler expression

acquisition(x) - Q(x).

In addition, the latter would not change the relative magnitude of the acquisition function to the proximal term if the acquisition is negative. If it takes a large negative value, softplus might also lead to numerical instabilities. Would love to hear your thoughts!

SebastianAment avatar Oct 10 '22 20:10 SebastianAment

Thanks for fixing this. One thing that we should keep in mind here is that if the acquisition function values are small (as is often the case later during the optimization for things like Expected Improvement) this could substantially impact the behavior for functions that are indeed guaranteed to be positive. So in that case it seems that the choice of a large beta would be necessary (and in general this highlights the importance of the choice of beta). Would it make sense to allow passing in beta as optional and fall back to the current behavior?

Balandat avatar Oct 10 '22 21:10 Balandat

@SebastianAment In response to large negative values for the base acquisition function I'm hoping that using a standardize outcome transform will prevent extreme values. It's not a perfect solution but I think for most cases the proposed solution is sufficient. As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa). Unless I'm misunderstanding what you wrote?

roussel-ryan avatar Oct 10 '22 21:10 roussel-ryan

@Balandat I could potentially remove the transform when all of the acquisition function values are found to be positive. Alternatively I could set a large beta as the default or raise an error when the acqf values are negative. We need to prevent it from ever being used with negative acqf because the issue can be quite easy to overlook

roussel-ryan avatar Oct 10 '22 21:10 roussel-ryan

As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa)

That makes sense. The multiplicative version lets us choose weights proportional to the scale of the acquisition functions, which we might not know a priori but could update during a run. It's also equivalent to the additive version by taking the log of the acquisition function, and that might be more easily applied to negative acquisition values. Thanks for the quick response, enjoyed reading your paper!

SebastianAment avatar Oct 10 '22 21:10 SebastianAment

I am not sure how widely used this is. The main thing I'm worried about is that the default behavior for non-negative but small-in-magnitude acquisition functions may change quite a bit with this change. How about we (i) make beta optional and if None do not apply the softplus, and (ii) raise an error if any of the acquisition values are negative, pointing the user to set a beta value?

Balandat avatar Oct 16 '22 04:10 Balandat

@Balandat made the change you suggested

roussel-ryan avatar Oct 17 '22 15:10 roussel-ryan

Updated w/suggestions. Thanks for the comments!

roussel-ryan avatar Oct 18 '22 14:10 roussel-ryan

@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Oct 18 '22 15:10 facebook-github-bot