Same observation being generatad
Hi,
I tried to run the code below to optimize a XGBoost classifier, but get stuck with same observation being tested all time. I expected some new observation being generated... or am I wrong?
Console output (after initial points generated). Notice that all iterations generates the same observation:
('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration: 1 | Last sampled value: -0.680226 | with parameters: {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
| Current maximum: -0.245901 | with parameters: {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
| Time taken: 0 minutes and 10.953415 seconds
('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration: 2 | Last sampled value: -0.680226 | with parameters: {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
| Current maximum: -0.245901 | with parameters: {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
| Time taken: 0 minutes and 10.790525 seconds
('XGB', {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0})
Iteration: 3 | Last sampled value: -0.680226 | with parameters: {'num_round': 20.0, 'subsample': 0.25, 'eta': 0.01, 'colsample_bytree': 0.25, 'max_depth': 2.0}
| Current maximum: -0.245901 | with parameters: {'num_round': 28.712248896201515, 'subsample': 0.88492808306639748, 'eta': 0.78136949498158781, 'colsample_bytree': 0.99625386365127699, 'max_depth': 5.3806033554623252}
| Time taken: 0 minutes and 10.6884 seconds
Full code for the program (uses the xgboost library)
import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=2500, n_features=45, n_informative=12, n_redundant=7, n_classes=2, random_state=42)
def xgbcv(max_depth, eta, colsample_bytree, subsample, num_round):
print("XGB", locals())
dtrain = xgb.DMatrix(X, label=y)
params = {
'booster': 'gbtree',
'objective': 'multi:softprob',
'silent': 1,
'max_depth': int(round(max_depth)),
'eta': eta,
'colsample_bytree': colsample_bytree,
'subsample': subsample,
'num_class': 2,
'eval_metric': 'mlogloss',
'seed': 42
}
r = xgb.cv(params, dtrain, int(round(num_round)), nfold=4, metrics={'mlogloss'}, seed=45, show_stdv=False)
return -r['test-mlogloss-mean'].mean()
xgbBO = BayesianOptimization(xgbcv, {
'max_depth': (2, 6),
'eta': (0.01, 0.8),
'colsample_bytree': (0.25, 1.0),
'subsample': (0.25, 1.0),
'num_round': (20, 30),
}, verbose=True)
xgbBO.maximize(init_points=32, n_iter=6)
Thanks in advice!
Ahh, I've seen this (and promptly ignored it) in the past. I believe it is a combination of UCB's obsession with edges, the bounded optimization of the acquisition function that goes slightly over the specified bounds and some rounding errors.
If you change your acquisition function to Expected Improvement, you should be ok:
xgbBO.maximize(init_points=2, n_iter=10, acq='ei')
However I will investigate this further, I've been willing to fix this for a while now, and I guess now is the time.
Thanks for raising the issue.
Scratch that!
This might be due to the function value being negative ==> UCB > 0 is not always true which seems to lead to trouble (not sure why yet, will investigate).
If you change your objective to:
return 1 - r['test-mlogloss-mean'].mean()
UCB should work just fine.
Hi fmfn! Thanks for your fast reply!
I seems that the culprit was the negative values of the score. I applied the code you suggested and the algorithm started a valid search (although I should replace it with return 1 - r.iloc[-1, 0], as I only needed the last value).
If you have time to your lib, I think a good point should be to include a minimize function too.
[]'s
Hi fmfn,
I am having a similar issue, but using expected improvement. I am using XGBoost reg:linear with booster:gblinear and the optimization gets stuck at the same values while showing the following warning repeatedly: Warning: Test point chose at random due to repeated sample.
I have been using bayes_opt for months now, but this is the first time I encounter this problem on a consistent basis. Do you know what might be causing it?

There's a bug with the proxy optimization done on the acquisition function, which is particularly severe with expected improvement. I'm not sure what is causing it yet, unfortunately, but the scipy optimizer is failing to find the maximum of the acquisition function at times.
Have you tried UCB? It tends to work better overall and is more robust against this bug. Also, I suggest adding a little bit of noise to the GP (alpha parameter, which you can pass with gp_params = {'alpha': ...}, and you can use the variation of your CV to guide your choice for the value), which makes the whole problem a lot more well defined. Another thing to try is playing with the Xi parameter.
ps: Assuming you are running scikit-learn 1.18, otherwise the noise parameter is called nugget.
Hi fmfn,
Thank you for the quick reply. With XGBoost this becomes almost a form of art. Anyhow, you are right, it seems to be more stable with UCB. When you get a chance, can you please explain your sentance:
"Also, I suggest adding a little bit of noise to the GP (alpha parameter, which you can pass with gp_params = {'alpha': ...}, and you can use the variation of your CV to guide your choice for the value), which makes the whole problem a lot more well defined. "
What's the logic behind this? If the standard deviation of your CV is low, then noise should be added to the GP fit?
Thanks again for your help.
Gaussian processes can naturally handle noisy regression (when the labels have noise); instead of assigning zero posterior variance to observed points, noisy GP with still show some uncertainty. The noise parameter for a GP with the RBF kernel can actually be directly interpreted as the expected value of the std variation of noise present in the labels.
Observations of the cross-validation of a model are definitely noisy, after all they are derived quantities calculated from the average of several individual observations (further averaged among folds). Therefore, a small change to the parameter can lead to a "relatively" large swing in observed value (your function is technically not continuous).
When doing bayesian optimization you are basically trying to fit a GP to the hyper-surface defined by f(ps) -> x. If this surface is noisy, your modelling of it should reflect it, it will lead to a better model that is more appropriate to the reality of your problem.
I am running into a similar issue: while during the Initialization phase the parameter space is sampled nicely, in the Optimization phase, for most parameters, only extreme values are tried. I'm optimizing an R2 score which can be negative. I tried optimizing 10 + R2 instead, because I read that that may a be a problem. While alleviated, the obsession for edges of the parameters space is still present. Why does the presence of negative values matter for one, and any suggestion on how to fix this?
I'm also seeing this "edge obsession" with on e of my params (alpha, seed_thresh, and mask_thresh).
The random / given initialization points give a good sample of the space, but once I get to the maximization portion of the code, the algorithm always chooses alpha=0 or alpha=1 and seed_thersh/mask_thresh =.4 or .9.
I'm using UCB with kappas of 10, 5, and 1. My scores are only positive and are fairly well behaved.
Initialization
--------------------------------------------------------------------------------------------------------
Step | Time | Value | alpha | mask_thresh | min_seed_size | min_size | seed_thresh |
1 | 00m22s | 0.81479 | 0.8800 | 0.9000 | 100.0000 | 100.0000 | 0.4000 |
2 | 00m22s | 0.82484 | 0.8800 | 0.8367 | 97.0000 | 33.0000 | 0.4549 |
3 | 00m22s | 0.82484 | 0.8800 | 0.8367 | 97.0000 | 33.0000 | 0.4549 |
4 | 00m23s | 0.80596 | 0.8800 | 0.7664 | 48.5327 | 61.8757 | 0.4090 |
5 | 00m23s | 0.82962 | 0.8800 | 0.6666 | 81.5941 | 13.2919 | 0.4241 |
6 | 00m22s | 0.70743 | 0.7219 | 0.7437 | 17.5233 | 28.8181 | 0.6414 |
7 | 00m21s | 0.43979 | 0.2976 | 0.5215 | 94.8511 | 64.3517 | 0.8054 |
8 | 00m23s | 0.84768 | 0.2408 | 0.6120 | 20.9162 | 32.0568 | 0.5938 |
9 | 00m22s | 0.81603 | 0.6403 | 0.7360 | 24.6371 | 89.1438 | 0.5964 |
10 | 00m24s | 0.82895 | 0.4123 | 0.4659 | 63.0934 | 10.2661 | 0.5906 |
11 | 00m22s | 0.77536 | 0.1803 | 0.7268 | 12.2180 | 69.7986 | 0.7694 |
12 | 00m23s | 0.71786 | 0.9697 | 0.7017 | 1.7283 | 87.1418 | 0.4590 |
13 | 00m19s | 0.14442 | 0.4860 | 0.5708 | 80.0456 | 42.1833 | 0.8415 |
14 | 00m22s | 0.80979 | 0.1810 | 0.8648 | 1.5454 | 53.7144 | 0.6080 |
15 | 00m20s | 0.21012 | 0.9539 | 0.5251 | 94.5773 | 1.5600 | 0.7119 |
16 | 00m19s | 0.15580 | 0.9824 | 0.6439 | 24.2936 | 56.2465 | 0.7527 |
17 | 00m21s | 0.84999 | 0.6045 | 0.8915 | 95.3123 | 24.6991 | 0.4303 |
18 | 00m18s | 0.07305 | 0.7312 | 0.8213 | 56.5674 | 86.4971 | 0.8207 |
19 | 00m23s | 0.85359 | 0.1550 | 0.7519 | 28.8857 | 32.4800 | 0.5863 |
20 | 00m24s | 0.82244 | 0.2414 | 0.4381 | 82.6430 | 14.5005 | 0.6036 |
21 | 00m22s | 0.81988 | 0.5954 | 0.8685 | 3.5614 | 54.1788 | 0.4786 |
22 | 00m20s | 0.18643 | 0.7339 | 0.4441 | 73.3577 | 27.7940 | 0.7647 |
23 | 00m22s | 0.83862 | 0.6037 | 0.7404 | 53.4283 | 99.3464 | 0.5586 |
24 | 00m17s | 0.01051 | 0.8708 | 0.7362 | 95.7069 | 58.4163 | 0.8590 |
25 | 00m21s | 0.61327 | 0.3797 | 0.7900 | 9.6831 | 96.0789 | 0.7906 |
seeded {'max_params': {'alpha': 0.1550, 'mask_thresh': 0.7519, 'min_seed_size': 28.8857, 'min_size': 32.4800, 'seed_thresh': 0.5863}, 'max_val': 0.8536}
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
Step | Time | Value | alpha | mask_thresh | min_seed_size | min_size | seed_thresh |
26 | 00m41s | 0.68458 | 0.0000 | 0.4000 | 0.0000 | 0.0000 | 0.4000 |
27 | 00m33s | 0.84261 | 0.0000 | 0.4000 | 33.1328 | 0.0000 | 0.4000 |
28 | 00m31s | 0.46382 | 0.0000 | 0.4000 | 77.6450 | 100.0000 | 0.9000 |
29 | 00m32s | 0.85606 | 0.0000 | 0.9000 | 72.6044 | 67.5066 | 0.4000 |
30 | 00m36s | 0.85364 | 0.0000 | 0.4000 | 52.9370 | 40.6417 | 0.4000 |
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
Step | Time | Value | alpha | mask_thresh | min_seed_size | min_size | seed_thresh |
31 | 00m41s | 0.86287 | 0.0000 | 0.9000 | 100.0000 | 83.0868 | 0.4000 |
32 | 00m32s | 0.57726 | 0.0000 | 0.4000 | 35.0292 | 100.0000 | 0.9000 |
33 | 00m25s | 0.00070 | 1.0000 | 0.9000 | 51.8428 | 0.0000 | 0.9000 |
34 | 00m24s | 0.00067 | 1.0000 | 0.9000 | 16.7209 | 0.0000 | 0.9000 |
35 | 00m34s | 0.73777 | 0.0000 | 0.4000 | 0.0000 | 29.9011 | 0.4000 |
Bayesian Optimization
--------------------------------------------------------------------------------------------------------
Step | Time | Value | alpha | mask_thresh | min_seed_size | min_size | seed_thresh |
36 | 00m43s | 0.84708 | 0.0000 | 0.4000 | 41.5305 | 18.6506 | 0.4000 |
37 | 00m36s | 0.85582 | 0.0000 | 0.4000 | 59.2420 | 55.6881 | 0.4000 |
38 | 00m35s | 0.86263 | 0.0000 | 0.4000 | 86.5147 | 77.5565 | 0.4000 |
39 | 00m36s | 0.56978 | 1.0000 | 0.4000 | 0.0000 | 70.0372 | 0.4000 |
40 | 00m33s | 0.85290 | 0.0000 | 0.9000 | 14.2876 | 82.4162 | 0.4000 |
Code looks like this:
def bo_best(self):
return {'max_val': self.Y.max(),
'max_params': dict(zip(self.keys, self.X[self.Y.argmax()]))}
preload, seeded_objective = _make_scorable_objective(arch_to_paths, arches,
train_data_path)
preload() # read datas into memory
seeded_bounds = {
'mask_thresh': (.4, .9),
'seed_thresh': (.4, .9),
'min_seed_size': (0, 100),
'min_size': (0, 100),
'alpha': (0.0, 1.0),
}
seeded_bo = BayesianOptimization(seeded_objective, seeded_bounds)
cand_params = [
{'mask_thresh': 0.9000, 'min_seed_size': 100.0000, 'min_size': 100.0000, 'seed_thresh': 0.4000},
{'mask_thresh': 0.8367, 'seed_thresh': 0.4549, 'min_seed_size': 97, 'min_size': 33}, # 'max_val': 0.8708
{'mask_thresh': 0.8367, 'min_seed_size': 97.0000, 'min_size': 33.0000, 'seed_thresh': 0.4549}, # max_val': 0.8991
{'mask_thresh': 0.7664, 'min_seed_size': 48.5327, 'min_size': 61.8757, 'seed_thresh': 0.4090}, # 'max_val': 0.9091}
{'mask_thresh': 0.6666, 'min_seed_size': 81.5941, 'min_size': 13.2919, 'seed_thresh': 0.4241}, # full dataset 'max_val': 0.9142}
# {'mask_thresh': 0.8, 'seed_thresh': 0.5, 'min_seed_size': 20, 'min_size': 0},
# {'mask_thresh': 0.5, 'seed_thresh': 0.8, 'min_seed_size': 20, 'min_size': 0},
# {'mask_thresh': 0.8338, 'min_seed_size': 25.7651, 'min_size': 38.6179, 'seed_thresh': 0.6573},
# {'mask_thresh': 0.6225, 'min_seed_size': 93.2705, 'min_size': 5, 'seed_thresh': 0.4401},
# {'mask_thresh': 0.7870, 'min_seed_size': 85.1641, 'min_size': 64.0634, 'seed_thresh': 0.4320},
]
for p in cand_params:
p['alpha'] = .88
n_init = 2 if DEBUG else 40
seeded_bo.explore(pd.DataFrame(cand_params).to_dict(orient='list'))
# Basically just using this package for random search.
# The BO doesnt seem to help much
seeded_bo.plog.print_header(initialization=True)
seeded_bo.init(n_init)
print('seeded ' + ub.repr2(bo_best(seeded_bo), nl=0, precision=4))
gp_params = {"alpha": 1e-5, "n_restarts_optimizer": 2}
n_iter = 2 if DEBUG else 10
for kappa in [10, 5, 1]:
seeded_bo.maximize(n_iter=n_iter, acq='ucb', kappa=kappa, **gp_params)
best_res = bo_best(seeded_bo)
print('seeded ' + ub.repr2(best_res, nl=0, precision=4))
I've just run the example in the opening post, and this appears to no longer be a problem - the parameters all vary as expected. I'm not sure exactly when this got fixed, but unless there are any objections I will close this issue.