SMAC3 icon indicating copy to clipboard operation
SMAC3 copied to clipboard

_SuccessiveHalving implementation is wrong

Open JohannesHog opened this issue 2 years ago • 1 comments

Description

I've found an error in the implementation of successive halving which also effects Hyperband and BOHB. Specifically, the number of configurations that are run for each budget is wrong.

Steps/Code to Reproduce

I have created an example where I limited the the function calls to 13 and specified a max_budget of 9, an initial budget of 1 and an eta of 3. This leads to the following budgets: 1, 3, 9.

Code: https://gist.github.com/JohannesHog/95b5d3bed09875c814462941de296ea5

Expected Results

If everything ran correctly, successive halving would evaluate 9 configurations on budget 1, 3 configurations on budget 3 and 1 configuration on budget 9.

Actual Results

Running the code prints out how many configuration ran for each budget: it ran 11 configs on budget 1, 1 config on budget 3 and 1 config on budget 9.

Versions

I am working with smac version 1.4.0

Proposal Solution

I've spent a few days searching for the bug and found a solution for it.

Decription Bug in Code

The bug is in the function _all_config_inst_seed_pairs_launched: https://github.com/automl/SMAC3/blob/83a9bbe57e64968ee0e79d5bbedf21a1f835719e/smac/intensification/successive_halving.py#L1042 The function should check if all the configs of the current stage have been launched. It basically takes all the configurations that the current intesifier launched (running_configs) and checks if it launched as many as it should have launched in the current stage (self.n_configs_in_stage[self.stage]). The problem with that is that it also counts the configurations from the previous stages. After the 1st stage this will tell us that all configurations are launched irrespective of the amount of actual configurations that got launched.

Solution

The code should compare the configurations that the current intensifier launched in this stage with the number of configs that should be launched in this stage. We could do this by adding a filter to https://github.com/automl/SMAC3/blob/83a9bbe57e64968ee0e79d5bbedf21a1f835719e/smac/intensification/successive_halving.py#L1068 and change it to configurations_by_this_intensifier = [c for c, i, s, b in self.run_tracker if b == self.all_budgets[self.stage]].

This solves the issue for me. I hope this was helpful, I've spent a lot of time on this.

JohannesHog avatar Aug 28 '22 09:08 JohannesHog

Hi Johannes,

thank you very much for finding this out. We will discuss this internally and will change the behaviour in SMAC 2.0.

renesass avatar Aug 29 '22 13:08 renesass

Hi Johannes, I tried out your fix but it results in not evaluating any trials anymore. Did you change more code to make it run?

renesass avatar Sep 26 '22 11:09 renesass

I don't exactly know what you mean by it's not evaluating any trials anymore. I just forked smac to check if I only changed that part and yes that is the case. It also fixes my example but maybe it creates new bugs somewhere else? Here is my forked repo with the example: https://github.com/JohannesHog/SMAC3/tree/main/smac If you could give me an example of what does not work, I can look at it if that would be helpful.

JohannesHog avatar Sep 26 '22 11:09 JohannesHog

Can you fork https://github.com/automl/SMAC3/pull/875 and check if your example still run as expected? Please use this "template" to get started: https://automl.github.io/SMAC3/development-2.0/examples/2_multi_fidelity/1_mlp_epochs.html#sphx-glr-examples-2-multi-fidelity-1-mlp-epochs-py

renesass avatar Sep 26 '22 16:09 renesass

I've recreated the example that I posted for the issue and it is still not running correctly but gets fixed by uncommenting: https://github.com/automl/SMAC3/blob/18aecffae373132ce67baec54895100133500908/smac/intensifier/successive_halving_worker.py#L763

example: https://gist.github.com/JohannesHog/754c513dccfa0062754ba3affa4696fb

On a different note, this might even be intended, but your current budget calculation of Hyperband does not match that of the original Hyperband formulation or BOHB. You are rounding down two times instead of rounding up in the end. your implementation: https://github.com/automl/SMAC3/blob/18aecffae373132ce67baec54895100133500908/smac/intensifier/hyperband_worker.py#L158 implementation following the papers: n_challengers = int(np.ceil((self._s_max + 1) / (self._s + 1) * eta ** self._s))

JohannesHog avatar Sep 26 '22 22:09 JohannesHog

We decided to rewrite the intensifier completely in version 2.0.0a2. The new SH/HB will be much easier to understand without the need of the run tracker. The issue should then be solved automatically.

renesass avatar Oct 20 '22 12:10 renesass

Close as it will be solved with the release of version 2.

helegraf avatar Feb 02 '23 14:02 helegraf