GMFlowNet icon indicating copy to clipboard operation
GMFlowNet copied to clipboard

About resume training

Open Eryo-iPython opened this issue 3 years ago • 5 comments

if args.restore_ckpt is not None:
    strStep = os.path.split(args.restore_ckpt)[-1].split('_')[0]
    total_steps = int(strStep) if strStep.isdigit() else 0
else:
    total_steps = 0

Hi, I found this code in train.py. The class OneCycleLR need a paramter 'last_epoch' to resume the lr, so it needs save the optimizer's state_dict, because we need a key called 'initial_lr'. However, I don't find the code to save optimizer's dict. Why? Do you have to rename the model file when you load it? This would allow this code to be skipped.

Eryo-iPython avatar Oct 02 '22 08:10 Eryo-iPython

Hey, We didn't save optimizer's state_dict. The total_steps is attained from the file name (usually it ends with the number of iterations). So when resume training, you may expect some differences from your original training process.

xiaofeng94 avatar Oct 02 '22 14:10 xiaofeng94

Thank you for replying, if 'initial_lr' not in group: raise KeyError("param 'initial_lr' is not specified " "in param_groups[{}] when resuming an optimizer".format(i)) Look at this code in OneCycle line 32. My question is there are no codes to add the parameter 'initial_lr' to optimizer in train.py, so if the 'last_epoch' in OneCycle is set, wouldn't an error be thrown at runtime?

Eryo-iPython avatar Oct 03 '22 00:10 Eryo-iPython

Hey, why not take a try. I think initial_lr may be set somewhere in RAFT's codebase.

xiaofeng94 avatar Oct 03 '22 15:10 xiaofeng94

How do you compare the results of your experiments with the 2-view ones when you are warm-start?

Eryo-iPython avatar Oct 10 '22 06:10 Eryo-iPython

Hey, we only report the 2-view results. You may check the caption of Table 2 in the paper or run the experiments on your own.

xiaofeng94 avatar Oct 10 '22 16:10 xiaofeng94