GMFlowNet
GMFlowNet copied to clipboard
About resume training
if args.restore_ckpt is not None:
strStep = os.path.split(args.restore_ckpt)[-1].split('_')[0]
total_steps = int(strStep) if strStep.isdigit() else 0
else:
total_steps = 0
Hi, I found this code in train.py. The class OneCycleLR need a paramter 'last_epoch' to resume the lr, so it needs save the optimizer's state_dict, because we need a key called 'initial_lr'. However, I don't find the code to save optimizer's dict. Why? Do you have to rename the model file when you load it? This would allow this code to be skipped.
Hey,
We didn't save optimizer's state_dict. The total_steps is attained from the file name (usually it ends with the number of iterations). So when resume training, you may expect some differences from your original training process.
Thank you for replying, if 'initial_lr' not in group: raise KeyError("param 'initial_lr' is not specified " "in param_groups[{}] when resuming an optimizer".format(i)) Look at this code in OneCycle line 32. My question is there are no codes to add the parameter 'initial_lr' to optimizer in train.py, so if the 'last_epoch' in OneCycle is set, wouldn't an error be thrown at runtime?
Hey, why not take a try. I think initial_lr may be set somewhere in RAFT's codebase.
How do you compare the results of your experiments with the 2-view ones when you are warm-start?
Hey, we only report the 2-view results. You may check the caption of Table 2 in the paper or run the experiments on your own.