Fast_Human_Pose_Estimation_Pytorch icon indicating copy to clipboard operation
Fast_Human_Pose_Estimation_Pytorch copied to clipboard

Student model overfits really early training

Open yiwang454 opened this issue 4 years ago • 4 comments

When I trained the student model under supervision of the teacher model downloaded from the link in readme as well as the labelled data, the validation accuracy drops quite quickly on the 8th epoch, and the best validation accuracy was only about 60%, way low from the paper result. Why does this happen and how to solve this problem? If I train to more epochs will the result be better?

yiwang454 avatar Sep 20 '19 09:09 yiwang454

The default epoch to train on mpii is 90. The result in 8 epoch may not very stable. Did you follow the default parameter in this repo?

yuanyuanli85 avatar Sep 20 '19 12:09 yuanyuanli85

Yes, I copy and paste the parameters from your repo. Here is the command I used. Pytorch$ python example/mpii_kd.py -a hg --stacks 2 --blocks 1 --checkpoint checkpoint/hg_s2_b1/ --mobile=True --teacher_stack 8 --teacher_checkpoint checkpoint/hg_s8_b1/model_best.pth.tar Also, I trained to 90 epoch now (which is your default setting), but the validation loss kept exploding, and the training accuracy converged at about 79%.

yiwang454 avatar Sep 23 '19 02:09 yiwang454

Btw I stick to the learning rate 2.5*10^-4 (which is your default number and also the setting in the paper). Should I really do that or should I actually change learning rate during training?

yiwang454 avatar Sep 23 '19 02:09 yiwang454

Are you using pytorch 0.4x ? If so, did you disable the cudnn for batchnorm layer. It is a known issue in pytorch which will cause the instablility of training.

yuanyuanli85 avatar Sep 24 '19 08:09 yuanyuanli85