SMAC3
SMAC3 copied to clipboard
Initializing SMAC with previous Random Search Trials
Hello together, Is it possible to initialize SMAC with previous random search trials as a jump start? For example, Optuna allows to add previous trials with an add_trial() method.
Another way I could think of would be to restore SMAC and provide the RS trials via runhistory.json as done for restore Branin. However, the trajectory and stats are not specified and I think SMAC doesn't support this solution currently.
So is it possible to reuse my RS trials with SMAC? I appreciate any ideas, thanks!
Hello BugsBuggy,
indeed, you can restore SMAC as in the Branin example. The stats can be loaded and the trajectory can be copied to the new output directory (uncomment line 88 & 89 in the Branin example). That should work for your use case. A convenience method like in Optuna is not provided yet.
I think I have to be more clear: is it possible even if the previous trials were not conducted with SMAC? E.g. I have a DataFrame with configurations of previous trials and their associated performance. I want to use this information to initialize additional trials with SMAC. There is no trajectory, stats or runhistory file.
Yes this is possible. You could do it like so (without guarantees for any side effects). Does this help you?
You can also add origin information to a configuration during creation.
Modified restore_branin.py:
import logging
logging.basicConfig(level=logging.INFO)
import os
import numpy as np
from smac.facade.smac_ac_facade import SMAC4AC
from smac.runhistory.runhistory import RunHistory
from smac.scenario.scenario import Scenario
from smac.stats.stats import Stats
from smac.utils.io.traj_logging import TrajLogger
from smac.tae import StatusType
__copyright__ = "Copyright 2021, AutoML.org Freiburg-Hannover"
__license__ = "3-clause BSD"
if "__main__" == __name__:
# CREATE DUMMY DATA -
# Initialize scenario, using runcount_limit as budget.
original_scenario_dict = {
'algo': 'python branin.py',
'paramfile': 'branin/configspace.pcs',
'run_obj': 'quality',
'runcount_limit': 25,
'deterministic': True,
'output_dir': 'restore_me'}
original_scenario = Scenario(original_scenario_dict)
smac = SMAC4AC(scenario=original_scenario, run_id=1)
smac.optimize()
# Instantiate new SMAC run
# Create scenario
new_scenario = Scenario(
original_scenario_dict,
cmd_options={'runcount_limit': 50, # overwrite these args
'output_dir': 'restored'})
# Populate runhistory with custom data (e.g. from DataFrame)
runhistory = RunHistory()
configurations = list(smac.runhistory.config_ids.keys())
costs = [smac.runhistory.get_cost(config) for config in configurations]
times = [0.] * len(costs) # add your runtimes if applicable
status = [StatusType.SUCCESS] * len(costs)
incumbent = configurations[np.argmin(costs)]
for i in range(len(configurations)):
runhistory.add(
config=configurations[i], # must be of type Configuration
cost=costs[i],
time=times[i],
status=status[i]
)
# Populate stats
stats = Stats(new_scenario)
keys = ["submitted_ta_runs", "finished_ta_runs"]
n_points = len(runhistory.data)
for key in keys:
setattr(stats, key, n_points)
# Now we can initialize SMAC with the recovered objects and restore the
# state where we left off. By providing stats and a restore_incumbent, SMAC
# automatically detects the intention of restoring a state.
smac = SMAC4AC(scenario=new_scenario,
runhistory=runhistory,
stats=stats,
restore_incumbent=incumbent,
run_id=1)
smac.optimize()
Yes, that's what I'm looking for, thanks!
Still, where do you get branin/configspace.pcs
from? Currently, I cannot reconstruct the original scenario, because 'paramfile'
is missing.
Update: I used the configspace of the new trials in original_scenario
, because the search space of my previous and new trials is the same. Now it looks like this:
original_scenario_dict = {'deterministic': True, 'run_obj': 'quality', 'runcount_limit': 210, 'cs': self.cs}
Nonetheless, this results in the following crash at the first call of smac.optimize()
:
smac.tae.FirstRunCrashedException: python-BaseException First run crashed, abort. Please check your setup -- we assume that your default configuration does not crashes.
Why do we run smac.optimize()
in the first place when the previous trials are not even incorporated yet? This seems to crash in my case.
The first smac.optimize()
call was used to get dummy data. In your case you don't have to do it as you already have that data in your DataFrame. Therefore you should call smac.optimize()
after incorporating the previous trials, as you said.
Regarding the configuration space: You can either provide a paramfile like in this example or you can create your own ConfigurationSpace (see this other example).
The initialization hopefully worked (future experiments will confirm), thanks a lot for your help and the quick answers!
I converted my previous trials into a runhistory.json like SMAC does and loaded it with runhistory.load_json()
.
My experiments also confirm that the initialization is successful. However, I came across a special case and wonder if/how one can handle two different configuration spaces. Say the configuration space of the initialized trials differs a bit from the search space I want to perform a search over afterwards. Can SMAC handle this case?
What I tried so far is defining two separate ConfigSpaces and swapping the smac.scenario.cs attribute after initializing with the previous RS trials. But when I exclude some parameters in the new ConfigSpace for the search, SMAC takes samples for the excluded ones, too. Taking the same config space (for initialization and the search afterwards) and modifying it after initialization gives the same result. The undesired parameters are still sampled, even though they were excluded from the ConfigSpace. Any ideas?
SMAC cannot natively handle the case of having one configuration space for the initial design and then another one for the search. What you could do is to transform the configurations from your initial trials into the new configuration space, that should be especially possible if you have more parameters in your initial configuration space than in your new one. Could you try that? So you start with your final configuration space right away.
I could also imagine that changing the configuration space after the initialization works if only your bounds are changing.
Thanks for the answer! Then I would have to exclude some trials, because they have invalid parameter choices. For example because some choices for categorical variables are not contained in the final design but in the initial design. Also, I wanted to try if there is some "transfer" of learning between categorical variables. That's only possible when I adapt my final configuration space to the initial one.
If SMAC cannot handle two search spaces, indeed the only option I'm left with is the one you suggested and then I'm not able to test for knowledge transfer, right?
Maybe converting your invalid choices into constant HPs could be a solution.
Could you elaborate on the transfer question, maybe add an example? What do you mean by 'test for knowledge transfer'?
I already tried to change them into constants. SMAC only allows to use
constant = h.Constant('relevant_var', 'choice1') self.cs.add_hyperparameter(constant)
and prohibits to modify the existing ConfigSpace via variable assignments.
I also tried to change the possible choices for categorical HPs.
self.smac.scenario.cs._hyperparameters['relevant_var'].choices = (tuple(['choice1']))
.
Either way SMAC samples from all choices of the initial design. Once initialized maybe SMAC uses a different ConfigSpace based on the runhistory or some other internal stats and not the one in scenario.cs
?
Sure, the knowledge transfer refers to a HP search for categorical variables with conditions. Say I have a categorical variable with 2 choices ['a', 'b']. Both 'a' and 'b' have some shared and some different additional HPs depending on which one we choose (conditional variables). Now I have some trials for the whole search space ('a' and 'b'). Since initialization is cheap I want to test whether initializing with the whole search space followed by a search only for categorical variable 'a' performs better than only initializing with 'a' and searching for 'a' afterwards. I want to test whether 'a' can learn from 'b', because some HPs shared. Does my example clarify?
If SMAC prohibits to modify existing configuration spaces, could you create a new one, not changing the existing?
Maybe you need to change smac.solver.scenario.cs
or smac.solver.config_space
, and maybe reinitialize smac.solver.epm_chooser
. I am unsure whether the scenario is passed as a reference or as a copy. Maybe I also missed a place where the configuration space is used. All in all, changing the configuration space in between is hacky because SMAC is not built with the intention to do so and it cannot be guaranteed that SMAC then works reliably. The question is whether the compromise of dropping some trials is viable.
Your test would work, if you could disable 'b' after the initial design phase of 'a' and 'b', which is linked to my answer above, which I cannot guarantee. However, if you already know that only 'a' is important, it might be better to put the full resources in optimizing 'a' because the shared HPs are of course also included in 'a'. Optimizing 'b' might mean something different than optimizing 'a' although they might share some sub-HPs.
Update regarding the initial question: It is easily possible to add previously evaluated configs via the tell interface. However, we still do not support changing configuration spaces and assume the configuration space to be static.
Just clarifying you meant to say "it is not easily possible"? If so, is there a reason why?
Typo, edited my answer! It is easily possible