scikit-optimize Loading checkpoints behaviour not as expected

I have a deep neural network model in a file called param_optimizer.py and I import the dimensions and the neural network's name to the file where I run gp_minimize():

from skopt import gp_minimize, callbacks
from skopt.plots import plot_convergence, plot_objective, plot_gaussian_process
from matplotlib import pyplot as plt
from skopt.utils import dump, load
from param_optimizer import dims, search_space
import os

dimensions, network = dims()

output_folder = f"E:/PhD/hyperparam_opt/{network}/output/"
if os.path.exists(output_folder) is False:
    os.makedirs(output_folder)
checkpoint = f"{output_folder}/checkpoint.pkl"
try:
    res = load(checkpoint)
    x0 = res.x_iters
    y0 = res.func_vals
    print(x0)
    search_results = gp_minimize(search_space, dimensions, x0=x0, y0=y0, acq_func='EI', n_calls=15, random_state=3, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint)])
except FileNotFoundError:
    print("Testing whether we got here")
    search_results = gp_minimize(search_space, dimensions, acq_func='EI', n_calls=15, random_state=3, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint)])

Now, my model hits OOM every now and then and I'm using the checkpoint.pkl to restart the Bayesian optimization. Unfortunately, the checkpointing mechanism doesn't work properly. While gp_minimize initializes fine and starts working on the optimization problem, it goes through the same hyperparameters that I've already used, meaning that it's wasting resources. Instead, I think it should evaluate a new set of hyperparameters. Here's the output from the print function in the above code on the third try of finding the estimated hyperparameters:

[[1, 0.08399650488680278, 'Adagrad'],
 [3, 0.043711823614689324, 'Adagrad'],
 [1, 0.024796350839631304, 'Adagrad'],
 [1, 0.08399650488680278, 'Adagrad'],
 [3, 0.043711823614689324, 'Adagrad'],
 [1, 0.024796350839631304, 'Adagrad']]

As you can see, the hyperparameters used in the first and the second (after loading checkpoint) attempts are identical and the checkpointing has not taken effect.

Feb 10 '21 16:02 ISipi

Seems like x0 any y0 are passed to the optimizer before optimization: https://github.com/scikit-optimize/scikit-optimize/blob/530da127c0e3d92fc5018115585e73fecced12a5/skopt/optimizer/base.py#L293 Can you investigate?

Feb 10 '21 18:02 kernc

Seems like x0 any y0 are passed to the optimizer before optimization: https://github.com/scikit-optimize/scikit-optimize/blob/530da127c0e3d92fc5018115585e73fecced12a5/skopt/optimizer/base.py#L293

Can you investigate?

Right, I had a quick look and I at least found a workaround. If you specify the base_estimator with the following piece of code, the hyperparameters are changed:

base_estimator = res.specs['args']['base_estimator']
search_results = gp_minimize(search_space, dimensions, base_estimator=base_estimator, x0=x0, y0=y0, acq_func='EI', n_calls=15, random_state=random_state, xi=0.05, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint), callbacks.DeltaXStopper(1e-8)])

It may be helpful to clarify this in the documentation?

Feb 10 '21 19:02 ISipi

Nice work! Warm-starting the base estimator certainly makes sense to resume it where it had left off.

If you'd care to amend the relevant example and/or other documentation you're referring to, and then PR those amendments, someone would certainly look to get them merged. :+1:

Feb 10 '21 23:02 kernc

I've made the PR. I added random_state = res.random_state as I figured that multiple reboots of the checkpoint may have some untested consequences on the random state as well.

Feb 11 '21 13:02 ISipi

scikit-optimize scikit-optimize copied to clipboard

Loading checkpoints behaviour not as expected

scikit-optimize
scikit-optimize copied to clipboard