keras icon indicating copy to clipboard operation
keras copied to clipboard

Early stopping halts training after second epoch with default patience

Open mogeid opened this issue 3 years ago • 5 comments
trafficstars

The EarlyStopping callback monitors the number of epochs without improvement through the wait attribute. Whenever there is improvement, this value is set to zero; when there is not, it is incremented by one. However, the patience attribute of EarlyStopping objects is set to zero by default. This means that the condition self.wait >= self.patience in this line is always verified. This triggers EarlyStopping after two epochs regardless of improvement when using the default patience.

mogeid avatar Apr 10 '22 17:04 mogeid

@mogeid In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thanks!

sushreebarsa avatar Apr 14 '22 15:04 sushreebarsa

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Apr 21 '22 16:04 google-ml-butler[bot]

@sushreebarsa I suppose a minimal working example could look something like this:

import numpy as np
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping

inp = Input(shape=(32,))
out = Dense(1)(inp)
model = Model(inp, out)
model.compile(loss='mse', optimizer='sgd')
x = np.random.rand(512, 32)
y = x.max(-1)**2 - x.min(-1)
early_stopping = EarlyStopping(monitor='loss', verbose=1, mode='min')
model.fit(x, y, shuffle=True, epochs=10, callbacks=[early_stopping])

#Epoch 1/10
#16/16 [==============================] - 0s 611us/step - loss: 0.0926
#Epoch 2/10
#16/16 [==============================] - 0s 769us/step - loss: 0.0879
#Epoch 00002: early stopping

Note the stopping despite the improvement in the monitored quantity.

mogeid avatar Apr 27 '22 18:04 mogeid

@gowthamkpr I was able to replicate the issue on colab using TF v2.8.0 ,please find the gist here for reference.Thanks!

sushreebarsa avatar Apr 28 '22 06:04 sushreebarsa

@mogeid Thank you for reporting the bug. I created a PR to fix it.

gowthamkpr avatar May 03 '22 00:05 gowthamkpr

@mogeid, I tried to execute the mentioned code on both tensorflow v2.8 and also the latest stable v2.12. Please find the difference between the outputs.

When I try to execute with v2.8, after epoch2 the execution was stopped, whereas with v2.12 the code was executed with the mentioned epochs. Kindly find the gist of it here. Thank you!

tilakrayal avatar May 10 '23 16:05 tilakrayal

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

github-actions[bot] avatar Jun 03 '23 02:06 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Jun 03 '23 02:06 google-ml-butler[bot]