HpBandSter icon indicating copy to clipboard operation
HpBandSter copied to clipboard

`n_good` and `n_bad` in BOHB configuration sampler

Open bkj opened this issue 5 years ago • 2 comments

In the paper, Eq3 says that the number of good and bad configurations should be:

Nb_l = max(N_min, q * Nb)
Nb_g = max(N_min, Nb - Nb_l)

which means that when the number of observations is less than 2 * N_min, the good and bad observations overlap. However, in the code, we have (essentially)

train_data_good = train_configs[idx[:n_good]]
train_data_bad  = train_configs[idx[n_good:n_good+n_bad]]

which means a) configs never overlap b) sometimes len(train_data_bad) is less than min_points_in_model.

Does that seem right? A fix would be to change the lines in question to

train_data_good = train_configs[idx[:n_good]]

bad_start       = min(train_configs.shape[0] - n_bad, n_good)
bad_end         = bad_start + n_bad
train_data_bad  = train_configs[idx[bad_start:bad_end]]

But maybe this doesn't matter? Is there some suite of experiments that we could use to make sure we still get good performance from this kind of change?

~ Ben

EDIT: Fixed the code suggestion. EDIT 2: Actually, on second thought -- I'm not sure I understand the intention in the paper. It says choose the n_good and n_bad "best and worst configurations, respectively." Can someone maybe elaborate?

bkj avatar Feb 27 '19 01:02 bkj

Hey Ben, thanks for spotting that. The second line used to be:

train_data_bad = self.impute_conditional_data(train_configs[idx[-n_bad:]])

and changed in this merge. Honestly, I don't think it makes a difference at all performance wise. This usually only affects the first few iterations (if your search space is not really large, and your max_budget is not much larger than min_budget). I played around with this choice during development, but never found this part to actually matter much. One could have a look at the surrogate experiments from the paper in the icml_2018 branch, but I don't have the time right now to investigate that, given that I expect no measurable difference on most benchmarks.

Briefly on the the good and bad KDEs. The idea of TPE (the BO part in BOHB) is to select potentially good configuraitons. It does so by finding points that are close to the good ones and hopefully far away from any bad ones. This is quantified by the ration of p_good/p_bad. The criteria as to what 'good' and 'bad' mean, is kind of arbitrary, but here we split the data in the best 15% and worst 85% with respect to the loss.

Hope that helps.

sfalkner avatar Feb 27 '19 07:02 sfalkner

I also stumbled over this and had wondered if it is a bug. I'm wondering why the initial line ("... -n_bad...") was changed, since that one does what it says in the paper?

mb706 avatar Aug 22 '21 16:08 mb706