GMFlowNet About resume training

if args.restore_ckpt is not None:
    strStep = os.path.split(args.restore_ckpt)[-1].split('_')[0]
    total_steps = int(strStep) if strStep.isdigit() else 0
else:
    total_steps = 0

Hi, I found this code in train.py. The class OneCycleLR need a paramter 'last_epoch' to resume the lr, so it needs save the optimizer's state_dict, because we need a key called 'initial_lr'. However, I don't find the code to save optimizer's dict. Why? Do you have to rename the model file when you load it? This would allow this code to be skipped.

Oct 02 '22 08:10 Eryo-iPython

Hey, We didn't save optimizer's state_dict. The total_steps is attained from the file name (usually it ends with the number of iterations). So when resume training, you may expect some differences from your original training process.

Oct 02 '22 14:10 xiaofeng94

Thank you for replying， if 'initial_lr' not in group: raise KeyError("param 'initial_lr' is not specified " "in param_groups[{}] when resuming an optimizer".format(i)) Look at this code in OneCycle line 32. My question is there are no codes to add the parameter 'initial_lr' to optimizer in train.py, so if the 'last_epoch' in OneCycle is set, wouldn't an error be thrown at runtime?

Oct 03 '22 00:10 Eryo-iPython

Hey, why not take a try. I think initial_lr may be set somewhere in RAFT's codebase.

Oct 03 '22 15:10 xiaofeng94

How do you compare the results of your experiments with the 2-view ones when you are warm-start?

Oct 10 '22 06:10 Eryo-iPython

Hey, we only report the 2-view results. You may check the caption of Table 2 in the paper or run the experiments on your own.

Oct 10 '22 16:10 xiaofeng94

GMFlowNet GMFlowNet copied to clipboard

About resume training

GMFlowNet
GMFlowNet copied to clipboard