[QUESTION] Regarding Hyperparameter Tuning of NN with keras/sklearn

Open chrisflip opened this issue 3 years ago • 1 comments

Hi, first of all, thanks for this amazing book. I have a question regarding chapter 10, hyperparameter tuning with keras and sklearn: the model allows for multiple hidden layers. However, I believe that n_neurons is fixed across all hidden layers. How can I make the model more flexible so that n_neurons can change with every layer?

Best, Chris

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):  # <=== n_neurons ???
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="relu"))  # <=== ???
    model.add(keras.layers.Dense(1))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(loss="mse", optimizer=optimizer)
    return model

keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)

keras_reg.fit(X_train, y_train, epochs=100,
              validation_data=(X_valid, y_valid),
              callbacks=[keras.callbacks.EarlyStopping(patience=10)])

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],
    **"n_neurons": np.arange(1, 100)**               .tolist(),
    "learning_rate": reciprocal(3e-4, 3e-2)      .rvs(1000).tolist(),
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Aug 23 '22 18:08 chrisflip

Hi @chrisflip ,

Thanks for your question and sorry for the late reply.

I see two options:

add one parameter per layer, e.g., n_neurons1, n_neurons2, etc.
add one parameter n_neurons that contains a list of number of neurons (e.g., [100, 50, 10]) and use a custom function in the param_distribs dictionary to sample for this multi-dimensional space.

That said, I don't think it's necessary. People used to do this, but it would complicate things, and in practice it didn't really help. Using the same number of neurons at each layer usually works fine. There's essentially one exception: you may want a bottleneck layer in the middle, like in autoencoders, but this only requires one additional parameter.

Hope this helps.

Sep 26 '22 00:09 ageron