keras-ocr icon indicating copy to clipboard operation
keras-ocr copied to clipboard

Training returns keras.callbacks.History and does not complete the total number of epochs

Open fabioharry opened this issue 2 years ago • 2 comments

Training ends before reaching the set epoch amount. Detector and recognizer training.

Why is this happening?

I'm using the example notebook, even training on google colab or paperspace.com.

https://console.paperspace.com/fabioharry/notebook/rhnaf58unfsnkbp

Is there a way to continue from a checkpoint instead of starting from scratch every time?

detector_batch_size = 2
detector_basepath = os.path.join(data_dir, f'detector_{datetime.datetime.now().isoformat()}')
detection_train_generator, detection_val_generator, detection_test_generator = [
    detector.get_batch_generator(
        image_generator=image_generator,
        batch_size=detector_batch_size
    ) for image_generator in image_generators
]
detector.model.fit_generator(
    generator=detection_train_generator,
    steps_per_epoch=math.ceil(len(background_splits[0]) / detector_batch_size),
    epochs=50,
    workers=0,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(restore_best_weights=True, patience=5),
        tf.keras.callbacks.CSVLogger(f'{detector_basepath}.csv'),
        tf.keras.callbacks.ModelCheckpoint(filepath=f'{detector_basepath}.h5')
    ],
    validation_data=detection_val_generator,
    validation_steps=math.ceil(len(background_splits[1]) / detector_batch_size)
)

Epoch 7/50 827/827 [==============================] - 316s 382ms/step - loss: 0.0043 - val_loss: 0.0114 Epoch 8/50 827/827 [==============================] - 324s 391ms/step - loss: 0.0050 - val_loss: 0.0158 Epoch 9/50 827/827 [==============================] - 316s 382ms/step - loss: 0.0052 - val_loss: 0.0054 Epoch 10/50 827/827 [==============================] - 326s 395ms/step - loss: 0.0035 - val_loss: 0.0117 Epoch 11/50 827/827 [==============================] - 361s 437ms/step - loss: 0.0041 - val_loss: 0.0107 <keras.callbacks.History at 0x7f9e5a0c2490>

fabioharry avatar Jun 15 '22 19:06 fabioharry

You have an EarlyStopping callback with a patience of 5

tf.keras.callbacks.EarlyStopping(restore_best_weights=True, patience=5)

Remove it if you want to continue training

Bassem16 avatar Aug 06 '22 13:08 Bassem16