mltu icon indicating copy to clipboard operation
mltu copied to clipboard

Saving and Loading model errors

Open Pixel535 opened this issue 1 year ago • 7 comments

Hi, I am trying to train my model on my database according to the tutorial and sometimes the training takes quite a long time so I wanted to load the model saved by callback using this code:

        if os.path.exists("Model/model.h5"):
            HTR_Model = load_model("Model/model.h5")
            new_model = False
        else:
            img_shape = (self.height, self.width, 3)
            HTR_Model = self.HTR_Model(img_shape, characters_num, vocab)
            HTR_Model.compile_model()
            HTR_Model.summary(line_length=110)
            new_model = True

And then continue training with this code:

        earlystopper = EarlyStopping(monitor='val_CER', patience=20, verbose=1, mode='min')
        checkpoint = ModelCheckpoint("Model/model.h5", monitor='val_CER', verbose=1, save_best_only=True, mode='min')
        trainLogger = TrainLogger("Model")
        tb_callback = TensorBoard('Model/logs', update_freq=1)
        reduceLROnPlat = ReduceLROnPlateau(monitor='val_CER', factor=0.9, min_delta=1e-10, patience=10, verbose=1,
                                           mode='auto')
        model2onnx = Model2onnx("Model/model.h5")

        if new_model is True:
            HTR_Model.train(training_data,
                            val_data,
                            epochs=1000,
                            workers=20,
                            callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx])
        else:
            HTR_Model.fit(training_data,
                          validation_data=val_data,
                          epochs=1000,
                          workers=20,
                          callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
                          )

Unfortunately I encountered the following error: ValueError: Unknown loss function: CTCloss. Please ensure this object is passed to the custom_objects argument.

So I tried to add this argument like this:

HTR_Model = load_model("Model/model.h5", custom_objects={'CTCloss': CTCloss})

But It didn't work and I got this error: TypeError: CTCloss.__init__() got an unexpected keyword argument reduction

I couldn't solve it so I started looking for other ways to load the model. This time I tried to do it by saving the file in .tf format and load it without custom_objects argument and it caused an error: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the custom_objects arg when calling load_model() and make sure that all layers implement get_config and from_config.

After that I added argument like this:

HTR_Model = load_model("Model/model.tf", custom_objects={'CERMetric': CERMetric(vocabulary=vocab), 'WERMetric': WERMetric(vocabulary=vocab)})

And the error was TypeError: CERMetric.__init__() missing 1 required positional argument: 'vocabulary' Even though I used this argument. The only thing that works is this code:

HTR_Model = load_model("Model/model.h5", compile=False)
HTR_Model.compile(loss=CTCloss(), metrics=[CERMetric(vocabulary=vocab), WERMetric(vocabulary=vocab)], run_eagerly=False)

But it doesn't seem to be loading all these weights. I also tried using BackupAndRestore and picked up where I left off but still couldn't see if it saves those weights and continues using them. So Is it possible to somehow load a saved model while training is interrupted and continue training it so that it stays in accordance with the tutorial? (For example, I have epoch 53 /1000 and I see that the best value yet was saved to the model.h5 file at 52 epoch so I stop learning and then I want to load the saved model at epoch 52 and continue from there)

Pixel535 avatar Aug 01 '23 18:08 Pixel535