Bayesian-Optimization ML hyperparameters?

trafficstars

Could this actually be used to optimize machine learning hyper parameters? Must there be an existing dataset of samples?

It would be cool to try optimizing my keras/tf network with this. I know this is research code so I totally understand if it simply isn’t set up for actual use of that sort.

In particular can it deal with both discrete and continuous parameters for example optimizer and learning rate respectively? I know one can map from continuous to discrete with a categorical encoding but I’m not sure that’s appropriate. I’m here because I was looking at gpyopt which looks like it can do what I need, but I came across the paper and it seemed interesting, especially considering the comparison and claims in the paper.

This isn’t in a topic I’m super familiar with so please forgive the naive questions, and thanks for your consideration.

Feb 08 '18 05:02 ahundt

Hey, thanks for your interest and sorry for the delay in replying.

Yes it is research code, but you can definitely do that. As a matter of fact, in a updated version of the paper, I performed BO on tuning OpenAI's PPO baselines. In the following days, I plan to update both the paper in the arXiv and the code to include the PPO example, so it would be easier for you to test my algorithm. Other, more polished options include Spearmint, Scikit-optimize, GPyOpt, DiceOptim and Cornell-MOE.

A couple of remarks, in case you try the algorithm:

OEI is by definition more explorative. Using rough kernels (Matern 3/2 or 5/2) is recommended as they make interpolating regions more interesting.
Although not properly tested, the code supports noisy objectives [by using the plug-in heuristic of y_min = min(posterior mean of observations)]. Keep in mind that I haven't tested the performance of the algorithm in the case of a noisy objective, as the performance of the different heuristics for handling the noisy case is a discussion on its own (see e.g. Picheny et al).
Finally, a considerable speedup can be brought in the code by parallelising the gradient-descent restarts in the optimization function.

Feb 27 '18 14:02 nrontsis

The code is now updated and includes the PPO case (see these lines). You could run it like this:

python run.py --function=RoboschoolHopper-v1 --noise=1e-6.

The paper is under a peer-review process now; will update it on arXiv after this is over.

Feb 27 '18 14:02 nrontsis

Cool thanks! Yeah I’ve been running with gpyopt and I just ended up running the random search stage for 2/3 of the steps before enabling the actual optimization for the final third. These SGD trained objectives are definitely extremely noisy.

Mar 02 '18 00:03 ahundt

Bayesian-Optimization Bayesian-Optimization copied to clipboard

ML hyperparameters?

Bayesian-Optimization
Bayesian-Optimization copied to clipboard