pytorch-deeplab-xception
pytorch-deeplab-xception copied to clipboard
Resume Training Problem
Hello I try to resume my training but looks like the loaded model didn't give the best start. What I mean is for example I am doing training until epoch 50 with mIoU is 0.63. Then I am doing --resume
at the last check point. Check point loaded successfully and it begins from epoch 51 with the continued LR. However for the first several epochs accuracy the mIoU is 0.32 to 0.33. This is weird, the model should start from mIoU roughly from 0.60 to 0.63.
I think this may be because you have not saved the optimizer parameters
@stillwaterman What do you mean by that?, the optimizer parameter should be saved by this code right? in file train.py
at line 167-176, especially 'optimizer': self.optimizer.state_dict(),
, am I right?
new_pred = mIoU
if new_pred > self.best_pred:
is_best = True
self.best_pred = new_pred
self.saver.save_checkpoint({
'epoch': epoch + 1,
'state_dict': self.model.module.state_dict(),
'optimizer': self.optimizer.state_dict(),
'best_pred': self.best_pred,
}, is_best)
@herleeyandi Were you able to find out what the problem was? I'm facing the same issue.