BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

GP with UCB bug?

Open davidenitti opened this issue 8 years ago • 13 comments

I have a weird behaviour of the GP with UCB (with noisy data) the posterior of the GP changes completely after a point, e.g. from: ucb8 xi 0 0 k 11 0t to ucb9 xi 0 0 k 11 0t

and from ucb11 xi 0 0 k 11 0t to ucb12 xi 0 0 k 11 0t

Another weird behavior is that is some cases UCB does not select the upper bound (the last selected point is in yellow). I would expect that -2 is selected, but is not: ucb12 xi 0 0 k 11 0t

ucb13 xi 0 0 k 11 0t

ucb14 xi 0 0 k 11 0t

is this a bug or is this normal?

davidenitti avatar Mar 22 '16 12:03 davidenitti

What kappa value are you using? 11?

fmfn avatar Mar 22 '16 15:03 fmfn

11, yes, I know it's a high value, but I wanted to explore more. the function I have is noisy. the ei acquisition function does not always explore enough, so I use ucb with high kappa. (the xi value seems dependent on the y values of function, so I don't want to use that)

davidenitti avatar Mar 22 '16 16:03 davidenitti

I would expect that the point selected is the one with highest utility function. this happens with ucb, but sometimes it doesn't as in this case. Apparently this happens when kappa is high. with the ei function the point selected is never the one with highest utility function. why?

davidenitti avatar Mar 22 '16 16:03 davidenitti

That's fine. I suspect the reason why what you see plotted doesn't match what is actually chosen is likely due to you plotting the 95% C.I., while exploring with a kappa=11 (as opposed to 1.96).

Now, regarding the wild shape changes of the GP, I guess you will have to live with them, it is what it is. You may wanna increase the search for good GP kernel parameters within the GP object itself to see if it leads to more stability.

fmfn avatar Mar 22 '16 17:03 fmfn

thank you for the reply. The shape of the GP is not really an issue. is just the behavior of UCB. I think the plot is correct, I just forgot to change the text: axis.fill(np.concatenate([x, x[::-1]]), np.concatenate([mu - kappa * sigma, (mu + kappa * sigma)[::-1]]), alpha=.6, fc='c', ec='None', label='95% confidence interval') in any case if x[np.argmax(utility)] is -2 I would expect that point to be selected (regardless the plot). now this usually happen but for high kappa it doesn't. I noticed that this happens when x[np.argmax(utility)] is one of the limits

davidenitti avatar Mar 22 '16 17:03 davidenitti

with the ei function the point selected is never the one with highest utility function. why?

in any case if x[np.argmax(utility)] is -2 I would expect that point to be selected (regardless the plot).

Hum, ok, this is concerning. The edges are usually challenging and sometimes leads to trouble confusing the underlying ‘L-BFGS-B optimization. There's an open PR that would fix that by changing from optimization to massive random sampling, which could be a better alternative for lower dimensional parameter spaces.

If you could share your code I can do some poking around, see if there are any obvious bugs.

fmfn avatar Mar 22 '16 17:03 fmfn

note that sometimes it peaks the right edge value. it should get the wrong around steps 9-10 and 14-15 (well if there is some sampling involved the points are not those) note that the yellow point is the last, so the argmax utility should be the next yellow point (but I visualize the current last point). So the yellow point and argmax utility do not have to match in the same image.

import numpy as np
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
import math
import time
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import KFold
import random,sys
from matplotlib import gridspec
from bayes_opt import BayesianOptimization
def posterior(bo, x):
    bo.gp.fit(bo.X, bo.Y)
    mu, sigma2 = bo.gp.predict(x.reshape(-1, 1), eval_MSE=True)
    return mu, np.sqrt(sigma2)
def plot_gp(bo, x,kappa,nsplit,acq):

    fig = plt#.figure(figsize=(16, 10))
    plt.clf()
    fig.suptitle('GP nsplit {}, k: {},acq {}. {} Steps'.format(nsplit,kappa,acq,len(bo.X)), fontdict={'size':30})

    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])

    mu, sigma = posterior(bo,x)

    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - kappa * sigma, (mu + kappa * sigma)[::-1]]),
        alpha=.6, fc='c', ec='None', label='95% confidence interval')

    axis.set_xlim((x.min()-(x.max()-x.min())*0.001, x.max()+(x.max()-x.min())*0.001))
    axis.set_ylim((None,None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})

    utility = bo.util.utility(x.reshape((-1, 1)), bo.gp, 0)
    acq.plot(x, utility, label='Utility Function', color='purple')
    acq.plot(x[np.argmax(utility)], np.max(utility), '*', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    acq.set_xlim((x.min()-(x.max()-x.min())*0.001, x.max()+(x.max()-x.min())*0.001))
    acq.set_ylim((np.min(utility)-(np.max(utility)-np.min(utility))*0.05, np.max(utility)+(np.max(utility)-np.min(utility))*0.05))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})

    axis.plot(bo.X.flatten()[:-1], bo.Y[:-1], 'o', markersize=7, label=u'Observations', color='r')
    axis.plot(bo.X.flatten()[-1:], bo.Y[-1:], 'D', markersize=9, label=u'Observations', color=(0.9, 0.9, 0.0))

    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
## load dataset
plt.rcParams['figure.figsize']=(18.0, 12.0);


plt.ion()
rng = np.random


nsplit=10
def target(logalpha):
    x=(logalpha+6)*3
    return ((np.exp(-(x - 2)**2) + np.exp(-(x - 6)**2/10) + 1/ (x**2 + 1)-2)*2+np.sin(x*30)*0.1)#+ np.sin(x*10)*0.1)#*1000 np.random.randn()/5


minalpha=-7.
maxalpha=-2.
bo = BayesianOptimization(target, {'logalpha': ( minalpha,maxalpha)})
bo.explore({'logalpha': [(maxalpha-minalpha)/2.,(maxalpha-minalpha)/3.]})
gp_params = {'corr': 'squared_exponential',
             'nugget': 0.01,"theta0": 1,"thetaL": 0.001, "thetaU": 2,"normalize":True}
#gp_params = {'corr': 'cubic'}
#bo.maximize(init_points=2, n_iter=0, acq='ucb', kappa=5, **gp_params)
acq='ucb'
kappa=20.
xi=0.0
print bo.bounds
bo.maximize(init_points=0, n_iter=1, acq=acq,kappa=kappa, xi=xi,**gp_params)
for i in xrange(1,301):
    bo.maximize(init_points=0,n_iter=1,acq=acq, kappa=kappa, xi=xi, **gp_params)
    x = np.linspace(minalpha,maxalpha, 1000)
    plot_gp(bo,x,kappa,nsplit,acq)
    plt.draw()
#plt.show()

print(bo.res['max'])
print(bo.res['all'])
print(bo.res)
exit()

davidenitti avatar Mar 22 '16 17:03 davidenitti

what about "ei"? why the chosen point never match with argmax utility? moreover, "ei" seems to not explore much. is this normal?

davidenitti avatar Mar 22 '16 17:03 davidenitti

You can pass the xi parameter to make it explore more as seen in this notebook.

I'll take a look at the code and try to get to the bottom of this.

fmfn avatar Mar 22 '16 18:03 fmfn

is xi a trick or is it used in bayesopt with ei acquisition function methods? In a paper I read that ei does not have parameters I saw that xi is dependent on the y values of the function. A simple rescaling changes the behavior. that is why I'm trying to avoid this.

davidenitti avatar Mar 23 '16 17:03 davidenitti

The modification is explained here, its origin dates way back.

And you are right, the real quantity of interest is xi/y_true_max, or something of the kind. You could alter the code to do something like that, or even dynamically change the value of xi based on the number of iterations.

fmfn avatar Mar 23 '16 19:03 fmfn

I made several tests, and apparently the optimizer rarely select the argmax utility when "ei" is used. for ucb the problem seems to happen only on the edges, But for "ei" seems more severe

davidenitti avatar Mar 29 '16 16:03 davidenitti

@davidenitti I haven't read the comments here but looking at your plots and given that your output values do not seem to be within the range of Normal distribution I am a bit suspicious that this issue is related to the scikit-learn's bug for computing posteriors. This bug is probably affecting all results people have been getting using bayes_opt package as mentioned here. It looks like scikit-learn's GP somehow assumes that the data is always going to be within the reach of multi-variate Normal or at least the output values are not very far from 0!

Amir-Arsalan avatar Nov 22 '19 18:11 Amir-Arsalan

closing as very old issue.

bwheelz36 avatar Oct 31 '22 20:10 bwheelz36