FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

can the train restore the place before the computer shutdown

Open CodeIsWorld opened this issue 7 years ago • 2 comments

when I was training the example, the computer crashed so I reboot the computer. I continue the 'python train/train.py' I would like to know if this starts from the beginning or the place we stopped? I know it will check the file checkpoint but is it from the 1st pic again? cause the iter is from 1

CodeIsWorld avatar Jul 21 '17 08:07 CodeIsWorld

Anyone on this? When trying to quit and restart the training it looks like it restart from zero, although some model files from previous training has been saved. So is it possible to breakdown the training into several runs? Also, is there a way to properly quit the training? (For now I do ctrl+C)

Thanks for your help Tets

Tetsujinfr avatar Jul 27 '17 01:07 Tetsujinfr

@CodeIsWorld @Tetsujinfr I tried to restore the model, but I am not sure if my understanding is correct. In train.py, there's a function called restore, and I think this function can restore the trained model.

The reason of starting from zero when you restart the program is the following code in train.py: for step in range(FLAGS.max_iters) In my code, I get the restored iteration number by adding the following code in the func restore: stem = os.path.splitext(os.path.basename(checkpoint_path))[1] global global_iter global_iter = int(stem.split('-')[1])

and modified the for loop : for step in range(global_iter,FLAGS.max_iters):

QtSignalProcessing avatar Sep 11 '17 20:09 QtSignalProcessing