dask-ml Include Bayesian sampling in Hyperband implementation

trafficstars

The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:

Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
Initializes models in a particular order:
- At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.
- When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.

The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.

They show this performance:

Similar to Dask-ML's benchmark, they start saturating between 16 and 32 workers.

Jul 12 '20 22:07 stsievert

Sounds fun

On Sun, Jul 12, 2020 at 3:32 PM Scott Sievert [email protected] wrote:

The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:

Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.

Initializes models in a particular order:

At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.

When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.

The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEXCMC3EY5KUWT34LLR3I2XNANCNFSM4OYAGQ3Q .

Jul 13 '20 14:07 mrocklin

Any updates here ?

Aug 25 '20 20:08 UTUnex

Probably not. All development happens on GitHub.

On Tue, Aug 25, 2020 at 3:04 PM Unex [email protected] wrote:

Any updates here ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680241692, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITLJAAWQMKZ7WVUKBLSCQKLZANCNFSM4OYAGQ3Q .

Aug 25 '20 20:08 TomAugspurger

Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)

Aug 25 '20 20:08 UTUnex

I don't believe anyone is working on it at the moment, though @stsievert might have a better idea.

On Tue, Aug 25, 2020 at 3:14 PM Unex [email protected] wrote:

Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680246244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAGDVPC756XRRKFYTSCQLSBANCNFSM4OYAGQ3Q .

Aug 25 '20 20:08 TomAugspurger

do you have the plan to implement the BOHB in the near future?

I don't know of anyone that has a plan to implement BOHB. I have some ideas on how to implement it but that's about it.

edit 2021-10 this would require a lot of work around initializing new models. There needs to be interplay with the different successive halving brackets, which mean _fit needs significant reworking. I think this would bests be enabled by making _fit a class to separate the various components. Customization could be enabled by various callbacks. Here's a prototype:

class _HyperOpt:
    def __init__(self, initial_params, model_fn):
        self.initial_params = initial_params
        self.model_fn = model_fn

    def start_fit(self):
        self.launched_models = self.n_models
        for _ in range(self.n_models):
            self.launch(self.model_fn(**random_params))

    def decision_made(self, ident: str, model: BaseEstimator, keep_training: bool):
        pass

    def _fit(self):
        # reworked version of dask_ml.model_selection._incremental._fit
        futures = self.start_fit()
        for f in as_completed(futures):
            ...
            promoted, fired = hyperband_alg()
            for m in promoted:
                self.decision_made(m, True)
            for m in fired:
                self.decision_made(m, False)

class _BayesianOnHyperBand(_HyperOpt):
    def decision_made(self, ident, model, keep_training):
        self.params_ = bayesian_update(model, self.params_)
        new_model = self.model_fn(**self.initial_params)
        if self.launched_models < self.n_models and not keep_training:
            self.launch(new_model)

    def start_fit(self):
        self.params_ = deepcopy(self.initial_params)
        self.launched_models = n_workers
        for _ in range(n_workers):
            m = self.model_fn(**self.initial_params)
            self.launch(m)

Aug 25 '20 20:08 stsievert

dask-ml dask-ml copied to clipboard

Include Bayesian sampling in Hyperband implementation

dask-ml
dask-ml copied to clipboard