Resume Training using saved checkpoints.
Hi again,
I want to know how can we used the saved checkpoints to reusme training?
I used the following code for this purpose, but it used to give me few warnings and I am not sure if it was loading the weights correctly:
```
if opt.resume >= 0:
model_param_file = glob.glob('%s/checkpoint_%s*.model' % (opt.path_to_chkpt_folder, opt.resume))
net = torch.load(model_param_file[0])
opt.resume is the epoch number I want to resume training from..
Thanks!
Can you give me warning messages? I think it should load weights without problem.
Sorry for being late in getting back to you,. I was in middle of some experiments so could not get back to this thing earlier,
Here is the warning I get:
/home/user/anaconda2/envs/py36/lib/python3.6/site-packages/torch/serialization.py:434: SourceChangeWarning: source code of class 'model2.MACNetwork' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Also, one more question: Can you please explain what the accumulate() function does in train.py? I could not fully understand its purpose except that it seems to apply weights decay before doing validation?
This means that model code has benn changed after saved model. You can suppress warning messages by saving model.state_dict() instead of model. But if you changed model algorithms, it will not work as expected.
accumulate function calculates running average of model weights and save it to net_running. In the paper authors have used moving averages of model at the evaluation. It will also better to saving running averages of the model to use it later. 564ca5b will resolve this.
Okay thanks for explaining it to me. Appreciate it! I will get back to you if I will need more help.