orion
orion copied to clipboard
fidelity_index doesn't support nested param
Describe the bug
I am runnig orion with the hydra plugin, and when I use a nested param of the config for the fidelity space for BOHB, e.g. hydra.sweeper.params.model.trainer.max_epochs: "fidelity(low=1, high=2)"
, the fidelity_index
gets set as "model.trainer.max_epochs"
, but the trial.params
dict keeps the nested structure :
{'model': {'params': {'lr': 0.0001783,
'lr_scheduler_args': {'T_max': 72312},
'weight_decay': 0.01001},
'trainer': {'max_epochs': 1.0}}}
So I get :
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'
Expected behavior
I'd expect either the fidelity_index
to keep the nested structure somehow, or the trial.params
dict to get flattened keys, something like:
{
'model.params.lr': 0.0001783,
'model.params.lr_scheduler_args.T_max': 72312,
'model.params.weight_decay': 0.01001,
'model.trainer.max_epochs': 1.0
}
For now I can easily avoid the issue by using a non-nested param in my config file:
hydra.sweeper.params.max_epochs: "fidelity(low=1, high=2)"
Steps to reproduce Define a fidelity dimension with a nested param.
Environment (please complete the following information):
- OS: MacOS Sonoma 14.1.1
- Python version: 3.9
- Oríon version: 0.2.4.post1+computecanada
- Database: PickleDB
Additional context The full error log :
[2023-12-05 08:13:00,956][HYDRA] Orion Optimizer {'type': 'bohb', 'config': {'seed': 1, 'min_points_in_model': 4, 'top_n_percent': 40, 'num_samples': 5}}
[2023-12-05 08:13:00,956][HYDRA] with parametrization {'model.params.lr': 'loguniform(1e-05, 0.01)', 'model.params.lr_scheduler_args.T_max': 'uniform(1000, 100000, discrete=True)', 'model.params.weight_decay': 'loguniform(0.01, 100)', 'model.trainer.max_epochs': 'fidelity(1, 2)'}
Traceback (most recent call last):
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 353, in clientctx
yield client
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
raise e
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 507, in sweep
self.optimize(self.client)
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 525, in optimize
trials = self.sample_trials()
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 555, in sample_trials
trials = self.suggest_trials(self.n_workers())
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 434, in suggest_trials
trial = self.client.suggest(pool_size=count)
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 563, in suggest
if self.is_done:
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 167, in is_done
return self._experiment.is_done
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/experiment.py", line 541, in is_done
self.algorithms.is_done and num_pending_trials == 0
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/primary_algo.py", line 277, in is_done
return super().is_done or self.algorithm.is_done
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 293, in is_done
return self.has_completed_max_trials or self.has_suggested_all_possible_values()
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
lambda: hydra.multirun(
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 162, in multirun
ret = sweeper.sweep(arguments=task_overrides)
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/orion_sweeper.py", line 79, in sweep
return self.sweeper.sweep(arguments)
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
raise e
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.9.6/lib/python3.9/contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 355, in clientctx
client.close()
File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 828, in close
raise RuntimeError(
RuntimeError: There is still reserved trials: dict_keys(['7ba7eed37ff08c60dc9bad9341405be4'])
Release all trials before closing the client, using client.release(trial).