yolor about resume training

Hello, I have trained a model of yolor-p6 on my dataset in 1000 epochs. However, when I tried to fine-tune the network and loaded the 300 epochs weight, it started to train from zero epoch. Is it normal? Or just I didn't load the old weight. And how can I know if I have successfully loaded the old weight?

Dec 28 '21 04:12 amaze567

As long as you load the "checkpoint.pt" file as your weight, it should be all good. The epochs concern the actual training you are running, so it's normal that they start from zero.

Dec 28 '21 10:12 Wazaki-Ou

@Wazaki-Ou Thanks for your reply. Although I still have a question for it. The loss of the stating epoch of fine-tune training is 0.1604, but the loss of the checkpoint file which I loaded had been trained to 0.02xx. Shouldn't they be the same or not differ too much? That's why I am considering if the program has loaded the checkpoint file.

Dec 28 '21 15:12 amaze567

@amaze567 I'm not sure if that's an incorrect behavior to be honest. I hope someone else who has a better understanding of how resume works can help.

Dec 29 '21 07:12 Wazaki-Ou

@Wazaki-Ou OK. Still thanks for your reply. :)

Dec 29 '21 08:12 amaze567

@amaze567 I think I have the same issue and the checkpoint did not actually load so it is training on -- weights '' Have you faced any issue when reloading the checkpoint .pt, the epochs do not start at 0? I seem to be having this issue when I load my checkpoint?

Jan 03 '22 22:01 Wilbertbh-Tan

@Wilbertbh-Tan Hi, I am still facing the same issue. I tried many times reloading old weights but still trained from zero epoch. Do you have any progress on it?

Jan 10 '22 07:01 amaze567

@amaze567 Yes. First ensure your path for the weights is correct. If it isn't it will train from scratch. When you resume training by running train.py, it should resume from where you left off, to change this for fine-tuning, I edited the train.py script to change the starting epoch to what I wanted.

I'm not sure in your case whether the weight file is being created from scratch or it is resuming? Can you verify this by checking the log

Jan 27 '22 05:01 Wilbertbh-Tan

我想问一下，恢复训练时学习率会发生变化啊。如何保证延续之前的学习率呢？

Jul 10 '22 11:07 qutyyds

會用epoch去schedule裡拿出對應的學習率.

Jul 10 '22 13:07 WongKinYiu

yolor yolor copied to clipboard

about resume training

yolor
yolor copied to clipboard