SMAC3 icon indicating copy to clipboard operation
SMAC3 copied to clipboard

Initializing SMAC with previous Random Search Trials

Open BugsBuggy opened this issue 3 years ago • 12 comments

Hello together, Is it possible to initialize SMAC with previous random search trials as a jump start? For example, Optuna allows to add previous trials with an add_trial() method.

Another way I could think of would be to restore SMAC and provide the RS trials via runhistory.json as done for restore Branin. However, the trajectory and stats are not specified and I think SMAC doesn't support this solution currently.

So is it possible to reuse my RS trials with SMAC? I appreciate any ideas, thanks!

BugsBuggy avatar Jan 04 '22 10:01 BugsBuggy

Hello BugsBuggy,

indeed, you can restore SMAC as in the Branin example. The stats can be loaded and the trajectory can be copied to the new output directory (uncomment line 88 & 89 in the Branin example). That should work for your use case. A convenience method like in Optuna is not provided yet.

benjamc avatar Jan 06 '22 09:01 benjamc

I think I have to be more clear: is it possible even if the previous trials were not conducted with SMAC? E.g. I have a DataFrame with configurations of previous trials and their associated performance. I want to use this information to initialize additional trials with SMAC. There is no trajectory, stats or runhistory file.

BugsBuggy avatar Jan 06 '22 09:01 BugsBuggy

Yes this is possible. You could do it like so (without guarantees for any side effects). Does this help you?

You can also add origin information to a configuration during creation.

Modified restore_branin.py:

import logging

logging.basicConfig(level=logging.INFO)

import os
import numpy as np

from smac.facade.smac_ac_facade import SMAC4AC
from smac.runhistory.runhistory import RunHistory
from smac.scenario.scenario import Scenario
from smac.stats.stats import Stats
from smac.utils.io.traj_logging import TrajLogger
from smac.tae import StatusType

__copyright__ = "Copyright 2021, AutoML.org Freiburg-Hannover"
__license__ = "3-clause BSD"

if "__main__" == __name__:

    # CREATE DUMMY DATA - 
    # Initialize scenario, using runcount_limit as budget.
    original_scenario_dict = {
        'algo': 'python branin.py',
        'paramfile': 'branin/configspace.pcs',
        'run_obj': 'quality',
        'runcount_limit': 25,
        'deterministic': True,
        'output_dir': 'restore_me'}
    original_scenario = Scenario(original_scenario_dict)
    smac = SMAC4AC(scenario=original_scenario, run_id=1)
    smac.optimize()

    # Instantiate new SMAC run
    # Create scenario
    new_scenario = Scenario(
        original_scenario_dict,
        cmd_options={'runcount_limit': 50,  # overwrite these args
                     'output_dir': 'restored'})

    # Populate runhistory with custom data (e.g. from DataFrame)
    runhistory = RunHistory()
    configurations = list(smac.runhistory.config_ids.keys())
    costs = [smac.runhistory.get_cost(config) for config in configurations]
    times = [0.] * len(costs)  # add your runtimes if applicable
    status = [StatusType.SUCCESS] * len(costs)
    incumbent = configurations[np.argmin(costs)]
    for i in range(len(configurations)):
        runhistory.add(
            config=configurations[i],  # must be of type Configuration
            cost=costs[i],
            time=times[i],
            status=status[i]
        )

    # Populate stats
    stats = Stats(new_scenario)
    keys = ["submitted_ta_runs", "finished_ta_runs"]
    n_points = len(runhistory.data)
    for key in keys:
        setattr(stats, key, n_points)

    # Now we can initialize SMAC with the recovered objects and restore the
    # state where we left off. By providing stats and a restore_incumbent, SMAC
    # automatically detects the intention of restoring a state.
    smac = SMAC4AC(scenario=new_scenario,
                   runhistory=runhistory,
                   stats=stats,
                   restore_incumbent=incumbent,
                   run_id=1)
    smac.optimize()

benjamc avatar Jan 06 '22 18:01 benjamc

Yes, that's what I'm looking for, thanks! Still, where do you get branin/configspace.pcs from? Currently, I cannot reconstruct the original scenario, because 'paramfile' is missing.

Update: I used the configspace of the new trials in original_scenario, because the search space of my previous and new trials is the same. Now it looks like this: original_scenario_dict = {'deterministic': True, 'run_obj': 'quality', 'runcount_limit': 210, 'cs': self.cs}

Nonetheless, this results in the following crash at the first call of smac.optimize(): smac.tae.FirstRunCrashedException: python-BaseException First run crashed, abort. Please check your setup -- we assume that your default configuration does not crashes.

Why do we run smac.optimize() in the first place when the previous trials are not even incorporated yet? This seems to crash in my case.

BugsBuggy avatar Jan 07 '22 09:01 BugsBuggy

The first smac.optimize() call was used to get dummy data. In your case you don't have to do it as you already have that data in your DataFrame. Therefore you should call smac.optimize() after incorporating the previous trials, as you said.

Regarding the configuration space: You can either provide a paramfile like in this example or you can create your own ConfigurationSpace (see this other example).

benjamc avatar Jan 07 '22 12:01 benjamc

The initialization hopefully worked (future experiments will confirm), thanks a lot for your help and the quick answers! I converted my previous trials into a runhistory.json like SMAC does and loaded it with runhistory.load_json().

BugsBuggy avatar Jan 07 '22 15:01 BugsBuggy

My experiments also confirm that the initialization is successful. However, I came across a special case and wonder if/how one can handle two different configuration spaces. Say the configuration space of the initialized trials differs a bit from the search space I want to perform a search over afterwards. Can SMAC handle this case?

What I tried so far is defining two separate ConfigSpaces and swapping the smac.scenario.cs attribute after initializing with the previous RS trials. But when I exclude some parameters in the new ConfigSpace for the search, SMAC takes samples for the excluded ones, too. Taking the same config space (for initialization and the search afterwards) and modifying it after initialization gives the same result. The undesired parameters are still sampled, even though they were excluded from the ConfigSpace. Any ideas?

BugsBuggy avatar Jan 11 '22 08:01 BugsBuggy

SMAC cannot natively handle the case of having one configuration space for the initial design and then another one for the search. What you could do is to transform the configurations from your initial trials into the new configuration space, that should be especially possible if you have more parameters in your initial configuration space than in your new one. Could you try that? So you start with your final configuration space right away.

I could also imagine that changing the configuration space after the initialization works if only your bounds are changing.

benjamc avatar Jan 13 '22 08:01 benjamc

Thanks for the answer! Then I would have to exclude some trials, because they have invalid parameter choices. For example because some choices for categorical variables are not contained in the final design but in the initial design. Also, I wanted to try if there is some "transfer" of learning between categorical variables. That's only possible when I adapt my final configuration space to the initial one.

If SMAC cannot handle two search spaces, indeed the only option I'm left with is the one you suggested and then I'm not able to test for knowledge transfer, right?

BugsBuggy avatar Jan 13 '22 08:01 BugsBuggy

Maybe converting your invalid choices into constant HPs could be a solution.

Could you elaborate on the transfer question, maybe add an example? What do you mean by 'test for knowledge transfer'?

benjamc avatar Jan 13 '22 09:01 benjamc

I already tried to change them into constants. SMAC only allows to use constant = h.Constant('relevant_var', 'choice1') self.cs.add_hyperparameter(constant) and prohibits to modify the existing ConfigSpace via variable assignments.

I also tried to change the possible choices for categorical HPs. self.smac.scenario.cs._hyperparameters['relevant_var'].choices = (tuple(['choice1'])). Either way SMAC samples from all choices of the initial design. Once initialized maybe SMAC uses a different ConfigSpace based on the runhistory or some other internal stats and not the one in scenario.cs?

Sure, the knowledge transfer refers to a HP search for categorical variables with conditions. Say I have a categorical variable with 2 choices ['a', 'b']. Both 'a' and 'b' have some shared and some different additional HPs depending on which one we choose (conditional variables). Now I have some trials for the whole search space ('a' and 'b'). Since initialization is cheap I want to test whether initializing with the whole search space followed by a search only for categorical variable 'a' performs better than only initializing with 'a' and searching for 'a' afterwards. I want to test whether 'a' can learn from 'b', because some HPs shared. Does my example clarify?

BugsBuggy avatar Jan 13 '22 10:01 BugsBuggy

If SMAC prohibits to modify existing configuration spaces, could you create a new one, not changing the existing?

Maybe you need to change smac.solver.scenario.cs or smac.solver.config_space, and maybe reinitialize smac.solver.epm_chooser. I am unsure whether the scenario is passed as a reference or as a copy. Maybe I also missed a place where the configuration space is used. All in all, changing the configuration space in between is hacky because SMAC is not built with the intention to do so and it cannot be guaranteed that SMAC then works reliably. The question is whether the compromise of dropping some trials is viable.

Your test would work, if you could disable 'b' after the initial design phase of 'a' and 'b', which is linked to my answer above, which I cannot guarantee. However, if you already know that only 'a' is important, it might be better to put the full resources in optimizing 'a' because the shared HPs are of course also included in 'a'. Optimizing 'b' might mean something different than optimizing 'a' although they might share some sub-HPs.

benjamc avatar Jan 27 '22 08:01 benjamc

Update regarding the initial question: It is easily possible to add previously evaluated configs via the tell interface. However, we still do not support changing configuration spaces and assume the configuration space to be static.

benjamc avatar Mar 29 '23 09:03 benjamc

Just clarifying you meant to say "it is not easily possible"? If so, is there a reason why?

eddiebergman avatar Mar 29 '23 12:03 eddiebergman

Typo, edited my answer! It is easily possible

benjamc avatar Mar 29 '23 13:03 benjamc