botorch
botorch copied to clipboard
Buxfix for Proximal acquisition function wrapper for negative base acquisition functions
Motivation
This PR fixes a major issue when using the ProximalAcquisitionFunction with base acquisition functions that are not strictly positive. This PR fixes it by applying a Softplus transformation to the base acquisition function values (using optional beta = 1,0 argument) before multiplying by proximal weighting.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
Tests have been updated with correct (softplus transformed) values.
Codecov Report
Merging #1447 (5a6ceaf) into main (0277720) will not change coverage. The diff coverage is
100.00%.
:exclamation: Current head 5a6ceaf differs from pull request most recent head 213be32. Consider uploading reports for the commit 213be32 to get more accurate results
@@ Coverage Diff @@
## main #1447 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 124 124
Lines 11548 11554 +6
=========================================
+ Hits 11548 11554 +6
| Impacted Files | Coverage Δ | |
|---|---|---|
| botorch/acquisition/proximal.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
This is a higher level design question, rather than a comment on the particular fix.
@roussel-ryan, is there a reason against simply adding a quadratic to an arbitrary acquisition function? With this fix, ProximalAcquisitionFunction will compute
log(1+exp(acquisition(x))) * exp(-Q(x)),
where Q(x) = (x-x_{n-1}).t() * proximal_weights^{-1} * (x-x_{n-1})). Similar biasing could be achieved with the simpler expression
acquisition(x) - Q(x).
In addition, the latter would not change the relative magnitude of the acquisition function to the proximal term if the acquisition is negative. If it takes a large negative value, softplus might also lead to numerical instabilities.
Would love to hear your thoughts!
Thanks for fixing this. One thing that we should keep in mind here is that if the acquisition function values are small (as is often the case later during the optimization for things like Expected Improvement) this could substantially impact the behavior for functions that are indeed guaranteed to be positive. So in that case it seems that the choice of a large beta would be necessary (and in general this highlights the importance of the choice of beta). Would it make sense to allow passing in beta as optional and fall back to the current behavior?
@SebastianAment In response to large negative values for the base acquisition function I'm hoping that using a standardize outcome transform will prevent extreme values. It's not a perfect solution but I think for most cases the proposed solution is sufficient. As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa). Unless I'm misunderstanding what you wrote?
@Balandat I could potentially remove the transform when all of the acquisition function values are found to be positive. Alternatively I could set a large beta as the default or raise an error when the acqf values are negative. We need to prevent it from ever being used with negative acqf because the issue can be quite easy to overlook
As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa)
That makes sense. The multiplicative version lets us choose weights proportional to the scale of the acquisition functions, which we might not know a priori but could update during a run. It's also equivalent to the additive version by taking the log of the acquisition function, and that might be more easily applied to negative acquisition values. Thanks for the quick response, enjoyed reading your paper!
I am not sure how widely used this is. The main thing I'm worried about is that the default behavior for non-negative but small-in-magnitude acquisition functions may change quite a bit with this change. How about we (i) make beta optional and if None do not apply the softplus, and (ii) raise an error if any of the acquisition values are negative, pointing the user to set a beta value?
@Balandat made the change you suggested
Updated w/suggestions. Thanks for the comments!
@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.