[QUESTION] Regarding Hyperparameter Tuning of NN with keras/sklearn
Hi,
first of all, thanks for this amazing book. I have a question regarding chapter 10, hyperparameter tuning with keras and sklearn:
the model allows for multiple hidden layers. However, I believe that n_neurons is fixed across all hidden layers. How can I make the model more flexible so that n_neurons can change with every layer?
Best, Chris
def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]): # <=== n_neurons ???
model = keras.models.Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
for layer in range(n_hidden):
model.add(keras.layers.Dense(n_neurons, activation="relu")) # <=== ???
model.add(keras.layers.Dense(1))
optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
model.compile(loss="mse", optimizer=optimizer)
return model
keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)
keras_reg.fit(X_train, y_train, epochs=100,
validation_data=(X_valid, y_valid),
callbacks=[keras.callbacks.EarlyStopping(patience=10)])
from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV
param_distribs = {
"n_hidden": [0, 1, 2, 3],
**"n_neurons": np.arange(1, 100)** .tolist(),
"learning_rate": reciprocal(3e-4, 3e-2) .rvs(1000).tolist(),
}
rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(X_train, y_train, epochs=100,
validation_data=(X_valid, y_valid),
callbacks=[keras.callbacks.EarlyStopping(patience=10)])
Hi @chrisflip ,
Thanks for your question and sorry for the late reply.
I see two options:
- add one parameter per layer, e.g.,
n_neurons1,n_neurons2, etc. - add one parameter
n_neuronsthat contains a list of number of neurons (e.g.,[100, 50, 10]) and use a custom function in theparam_distribsdictionary to sample for this multi-dimensional space.
That said, I don't think it's necessary. People used to do this, but it would complicate things, and in practice it didn't really help. Using the same number of neurons at each layer usually works fine. There's essentially one exception: you may want a bottleneck layer in the middle, like in autoencoders, but this only requires one additional parameter.
Hope this helps.