dask-ml
dask-ml copied to clipboard
Include Bayesian sampling in Hyperband implementation
The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:
- Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
- Initializes models in a particular order:
- At first, initialize
num_workersmodels. Train them as the most aggressive bracket of Hyperband specifies. - When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.
- At first, initialize
The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.
They show this performance:
Similar to Dask-ML's benchmark, they start saturating between 16 and 32 workers.
Sounds fun
On Sun, Jul 12, 2020 at 3:32 PM Scott Sievert [email protected] wrote:
The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:
- Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
- Initializes models in a particular order:
- At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.
- When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.
The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEXCMC3EY5KUWT34LLR3I2XNANCNFSM4OYAGQ3Q .
Any updates here ?
Probably not. All development happens on GitHub.
On Tue, Aug 25, 2020 at 3:04 PM Unex [email protected] wrote:
Any updates here ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680241692, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITLJAAWQMKZ7WVUKBLSCQKLZANCNFSM4OYAGQ3Q .
Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)
I don't believe anyone is working on it at the moment, though @stsievert might have a better idea.
On Tue, Aug 25, 2020 at 3:14 PM Unex [email protected] wrote:
Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680246244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAGDVPC756XRRKFYTSCQLSBANCNFSM4OYAGQ3Q .
do you have the plan to implement the BOHB in the near future?
I don't know of anyone that has a plan to implement BOHB. I have some ideas on how to implement it but that's about it.
edit 2021-10 this would require a lot of work around initializing new models. There needs to be interplay with the different successive halving brackets, which mean _fit needs significant reworking. I think this would bests be enabled by making _fit a class to separate the various components. Customization could be enabled by various callbacks. Here's a prototype:
class _HyperOpt:
def __init__(self, initial_params, model_fn):
self.initial_params = initial_params
self.model_fn = model_fn
def start_fit(self):
self.launched_models = self.n_models
for _ in range(self.n_models):
self.launch(self.model_fn(**random_params))
def decision_made(self, ident: str, model: BaseEstimator, keep_training: bool):
pass
def _fit(self):
# reworked version of dask_ml.model_selection._incremental._fit
futures = self.start_fit()
for f in as_completed(futures):
...
promoted, fired = hyperband_alg()
for m in promoted:
self.decision_made(m, True)
for m in fired:
self.decision_made(m, False)
class _BayesianOnHyperBand(_HyperOpt):
def decision_made(self, ident, model, keep_training):
self.params_ = bayesian_update(model, self.params_)
new_model = self.model_fn(**self.initial_params)
if self.launched_models < self.n_models and not keep_training:
self.launch(new_model)
def start_fit(self):
self.params_ = deepcopy(self.initial_params)
self.launched_models = n_workers
for _ in range(n_workers):
m = self.model_fn(**self.initial_params)
self.launch(m)