[QUESTION] Chapter 10 RandomizedSearchCV returned best model having worse performance.

Open mjurzak opened this issue 2 years ago • 1 comments

Notebook: 10_neural_nets_with_keras.ipynb
Cell: 102

Problem: After RandomizedSearchCV using exactly the same hyperparameters as in the book I encounter the issue that it returns worse model.

Most important:

model = rnd_search_cv.best_estimator_.model()
model.evaluate(X_test, y_test)

162/162 [==============================] - 0s 1ms/step - loss: 4.5746
4.57464599609375

Expected: around the best observed loss while RandomizedSearchCV e.g.

Epoch 96/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2351 - val_loss: 0.2832

To Reproduce

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):
    model = keras.models.Sequential()
    options = {"input_shape": input_shape}
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="relu", **options))
        options={}
    model.add(keras.layers.Dense(1, **options))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(loss='mse', optimizer=optimizer)
    return model

Now it is not possible to get keras.wrappers.KerasRegression, hence:

from scikeras.wrappers import KerasRegressor
keras_reg = KerasRegressor(model=build_model, learning_rate=None, n_hidden=None, n_neurons=None)

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
"n_hidden": [0, 1, 2, 3],
"n_neurons": np.arange(1, 100),
"learning_rate": reciprocal(3e-4, 3e-2),
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                validation_data=(X_valid, y_valid),
                callbacks=[keras.callbacks.EarlyStopping(patience=10)])

A few of last lines:

363/363 [==============================] - 1s 2ms/step - loss: 0.2377 - val_loss: 0.3169
Epoch 93/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2348 - val_loss: 0.2939
Epoch 94/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2333 - val_loss: 0.2923
Epoch 95/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2353 - val_loss: 0.2991
Epoch 96/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2351 - val_loss: 0.2832
Epoch 97/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2326 - val_loss: 0.2875
Epoch 98/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2324 - val_loss: 0.2996
Epoch 99/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2308 - val_loss: 0.3063
Epoch 100/100
363/363 [==============================] - 1s 2ms/step - loss: 0.2318 - val_loss: 0.2876

Best params:

rnd_search_cv.best_params_, rnd_search_cv.best_score_

({'learning_rate': 0.009847435984689484, 'n_hidden': 3, 'n_neurons': 74},
 0.7897341518291179)

In the book:

learning_rate=0.0058 (but differences expected)
score = -0.32039451599121094

Important debug information:

model.summary()

Model: "sequential_98"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_279 (Dense)           (None, 30)                270       
                                                                 
 dense_280 (Dense)           (None, 1)                 31        
                                                                 
=================================================================
Total params: 301 (1.18 KB)
Trainable params: 301 (1.18 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Expected behavior The best model found in the RandomizedSearchCV does not seem to be applied for the best model and the model is fixed

Versions:

Windows 11
Python: 3.11.4
TensorFlow: 2.13.0
keras: 2.13.1
Scikit-Learn: 1.2.2
scikeras: 0.11.0

Sep 14 '23 14:09 mjurzak

A simple workaround I found is just use the best hyperparameters in the build_model() function:

md = build_model(**rnd_search_cv.best_params_)

md.fit(X_train, y_train, epochs=100,
    validation_data=(X_valid, y_valid),
    callbacks=[keras.callbacks.EarlyStopping(patience=10)])

md.evaluate(X_test, y_test)

162/162 [==============================] - 0s 1ms/step - loss: 0.3111
0.3110610246658325

This solution seems more like in book, however it is very inconvenient.

Sep 14 '23 15:09 mjurzak