TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 icon indicating copy to clipboard operation
TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 copied to clipboard

How to resume training from the lastest check point ?

Open tolotrasamuel opened this issue 7 years ago • 8 comments
trafficstars

I guess there should be some parameter to edit. Which one?

tolotrasamuel avatar Jul 23 '18 06:07 tolotrasamuel

If you run the train script it automatically picks up the last checkpoint and resume training from there. Does that answer your question?

patrickmlaw avatar Jul 23 '18 21:07 patrickmlaw

It doesn't resume training. It starts with Step 0 again. Can anyone resolve it?

tensorhunter avatar Jul 31 '18 01:07 tensorhunter

I solved it by changing the fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt" in the config file in /training to the path to my last checkpoint Something like: fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/training/model_45700.ckpt"

tolotrasamuel avatar Jul 31 '18 10:07 tolotrasamuel

I solved it by changing the fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt" in the config file in /training to the path to my last checkpoint Something like: fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/training/model_45700.ckpt"

With the new API the above fine_tune_checkpoint wont work, it has to be like this

fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/training/model.ckpt-45700"

zubairahmed-ai avatar Nov 08 '18 02:11 zubairahmed-ai

If you run the train script it automatically picks up the last checkpoint and resume training from there. Does that answer your question?

this works for me with tensorflow-gpu v1.12.0

mudasar477 avatar Dec 10 '19 17:12 mudasar477

it works for my env: tensorflow-gpu v1.12.0 programmers can tune the fine_tune_checkpoint value in Snipaste_2020-02-20_20-12-41 your config. file to the last stored model-ckpt-XXXXX(XXXXX means the steps for your training process.) fine_tune_checkpoint: "voc/train_dir/model.ckpt-XXXXX"

SovietLiu6tot avatar Feb 20 '20 12:02 SovietLiu6tot

Hi, can someone please confirm how can we resume the training process from the last checkpoint. I made the necessary changes in config file but no success

fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/training/model.ckpt-70000"

After changing, my training gets resumed from the last checkpoint and then stops after 70001.

Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from training/model.ckpt-70000 INFO:tensorflow:Restoring parameters from training/model.ckpt-70000 WARNING:tensorflow:From C:\Users\Yousaf\anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. INFO:tensorflow:Recording summary at step 70000. INFO:tensorflow:Recording summary at step 70000. INFO:tensorflow:global step 70001: loss = 0.2056 (31.675 sec/step) INFO:tensorflow:global step 70001: loss = 0.2056 (31.675 sec/step) INFO:tensorflow:Stopping Training. INFO:tensorflow:Stopping Training. INFO:tensorflow:Finished training! Saving model to disk. INFO:tensorflow:Finished training! Saving model to disk. WARNING:tensorflow:From C:\Users\Yousaf\anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. WARNING:tensorflow:From C:\Users\Yousaf\anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. C:\Users\Yousaf\anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\summary\writer\writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. "

Capture

Please confirm. Thank you

yousaf-safdar avatar May 17 '20 23:05 yousaf-safdar

@yousaf-safdar Did you get any solution? My training doesn't even resume. It starts from 0 again even though I have given the new checkpoints

lksh-stats avatar Oct 05 '21 05:10 lksh-stats