scikit-optimize
scikit-optimize copied to clipboard
Loading checkpoints behaviour not as expected
I have a deep neural network model in a file called param_optimizer.py and I import the dimensions and the neural network's name to the file where I run gp_minimize():
from skopt import gp_minimize, callbacks
from skopt.plots import plot_convergence, plot_objective, plot_gaussian_process
from matplotlib import pyplot as plt
from skopt.utils import dump, load
from param_optimizer import dims, search_space
import os
dimensions, network = dims()
output_folder = f"E:/PhD/hyperparam_opt/{network}/output/"
if os.path.exists(output_folder) is False:
os.makedirs(output_folder)
checkpoint = f"{output_folder}/checkpoint.pkl"
try:
res = load(checkpoint)
x0 = res.x_iters
y0 = res.func_vals
print(x0)
search_results = gp_minimize(search_space, dimensions, x0=x0, y0=y0, acq_func='EI', n_calls=15, random_state=3, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint)])
except FileNotFoundError:
print("Testing whether we got here")
search_results = gp_minimize(search_space, dimensions, acq_func='EI', n_calls=15, random_state=3, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint)])
Now, my model hits OOM every now and then and I'm using the checkpoint.pkl to restart the Bayesian optimization. Unfortunately, the checkpointing mechanism doesn't work properly. While gp_minimize initializes fine and starts working on the optimization problem, it goes through the same hyperparameters that I've already used, meaning that it's wasting resources. Instead, I think it should evaluate a new set of hyperparameters. Here's the output from the print function in the above code on the third try of finding the estimated hyperparameters:
[[1, 0.08399650488680278, 'Adagrad'],
[3, 0.043711823614689324, 'Adagrad'],
[1, 0.024796350839631304, 'Adagrad'],
[1, 0.08399650488680278, 'Adagrad'],
[3, 0.043711823614689324, 'Adagrad'],
[1, 0.024796350839631304, 'Adagrad']]
As you can see, the hyperparameters used in the first and the second (after loading checkpoint) attempts are identical and the checkpointing has not taken effect.
Seems like x0
any y0
are passed to the optimizer before optimization:
https://github.com/scikit-optimize/scikit-optimize/blob/530da127c0e3d92fc5018115585e73fecced12a5/skopt/optimizer/base.py#L293
Can you investigate?
Seems like
x0
anyy0
are passed to the optimizer before optimization: https://github.com/scikit-optimize/scikit-optimize/blob/530da127c0e3d92fc5018115585e73fecced12a5/skopt/optimizer/base.py#L293Can you investigate?
Right, I had a quick look and I at least found a workaround. If you specify the base_estimator with the following piece of code, the hyperparameters are changed:
base_estimator = res.specs['args']['base_estimator']
search_results = gp_minimize(search_space, dimensions, base_estimator=base_estimator, x0=x0, y0=y0, acq_func='EI', n_calls=15, random_state=random_state, xi=0.05, n_jobs=-1, callback=[callbacks.CheckpointSaver(checkpoint), callbacks.DeltaXStopper(1e-8)])
It may be helpful to clarify this in the documentation?
Nice work! Warm-starting the base estimator certainly makes sense to resume it where it had left off.
If you'd care to amend the relevant example and/or other documentation you're referring to, and then PR those amendments, someone would certainly look to get them merged. :+1:
I've made the PR. I added random_state = res.random_state
as I figured that multiple reboots of the checkpoint may have some untested consequences on the random state as well.