examples icon indicating copy to clipboard operation
examples copied to clipboard

Add option to load model weights from checkpoint before starting to t…

Open Viktor-Nilsson opened this issue 3 years ago • 8 comments

Added an optional parameter that allows passing a path to a checkpoint file when calling objectdetector.create() If a checkpoint path is passed, the underlying tf.keras.model will load the model weights from the checkpoint before training is started.

Viktor-Nilsson avatar Jan 18 '22 09:01 Viktor-Nilsson

@MarkDaoust I'm not sure how to do that in a good way. The existing test uses a random generated .jpg ( to avoid binary files in the git repo ?) .

I could add a new test case that loads one of my existing trained checkpoints, evaluates the model and verifies the test case and that weights are loaded by checking that the AP is high enough. Adding a checkpoint file for efficientdet-lite0 in the git repo is however not so nice since it is ~ 33MB of binary data. Thoughts?

Viktor-Nilsson avatar Jan 18 '22 18:01 Viktor-Nilsson

I'm not sure why the CODEOWNERS file didn't assign Khanh and Lu directly. They're the real owners here.

MarkDaoust avatar Jan 18 '22 18:01 MarkDaoust

I'm not sure why the CODEOWNERS file didn't assign Khanh and Lu directly. They're the real owners here.

Cause the pattern is wrong. Here we are in a subdir that It is only covered by your global.

bhack avatar Jan 18 '22 19:01 bhack

Oh, right. I'll send a fix for that.

MarkDaoust avatar Jan 18 '22 19:01 MarkDaoust

@ziyeqinghan Could you take a look?

khanhlvg avatar Jan 19 '22 00:01 khanhlvg

I tried training a model from a checkpoint but while training the losses returned NaN values. Is there any way around this or am I doing something wrong? error

ThuhinSatheesh avatar Feb 21 '22 15:02 ThuhinSatheesh

Hi and thanks for this. I would add that loading weights with model.load_weights() method didn't work in my case. I restored the checkpoint from model_dir by importing the function : from tensorflow_examples.lite.model_maker.third_party.efficientdet.keras.util_keras import restore_ckpt in the object_detector_spec.py file and calling it in the if block before the model.fit() method as you suggested:

if load_checkpoint_path is not None:
       restore_ckpt(model,load_checkpoint_path)

From what I understand this is because checkpoint for EfficientDetNetTrainHub are different and need a custom function to correctly restore them. Not sure about it though. be sure that in load_checkpoint_path dir there are ckpt-xx.dataxxx , ckpt-xx.index plus a checkpoint plain text file with the number of checkpoint you want to restore e.g:

from my terminal in model_dir path:

cat checkpoint give

model_checkpoint_path: "ckpt-100"
all_model_checkpoint_paths: "ckpt-100"

IvanColantoni avatar Apr 07 '22 15:04 IvanColantoni

Since there is no option to create issues, I just have a question how to do multi GPU training using tflite model maker ? https://github.com/tensorflow/examples/blob/master/tensorflow_examples/lite/model_maker/core/task/object_detector.py#L73-L75

imneonizer avatar Jul 01 '22 12:07 imneonizer

This does not seem to be in the actual code, yet I see a commit here. What is the status?

grewe avatar Oct 26 '22 22:10 grewe

Closing pr since it was reported not to work for other who attempted to use the code and I have no capacity to further investigate it.

Viktor-Nilsson avatar Oct 27 '22 07:10 Viktor-Nilsson

@Viktor-Nilsson This worked for me when I tried it

Bede-sv avatar Jan 11 '23 02:01 Bede-sv

I'd be keen to get this supported too, and as I am sure many others would as the ability to improve your own custom model is key without being wasteful with GPU retraining on data you've already trained with before.

justingrayston avatar Apr 14 '23 15:04 justingrayston