mean-teacher icon indicating copy to clipboard operation
mean-teacher copied to clipboard

About EMA

Open YilinLiu97 opened this issue 5 years ago • 3 comments

Hi, I found that the teacher model's weights seem to be not updated as it performed as bad as it was first initialized.

alpha = min(1 - 1 / (global_step + 1), alpha) for ema_param, param in zip(ema_model.parameters(), model.parameters()): ema_param.data.mul_(alpha).add_(1 - alpha, param.data)

Shouldn't this be ema_param.data.mul_(alpha).add_((1 - alpha)*param.data) ?

here are the parameters printed out during training: ('teacher_p: ', Parameter containing: tensor([ 0.0007, -0.0006, 0.0046, -0.0033, 0.0004, 0.0262, 0.0153, -0.0259, -0.0115, -0.0015, -0.0117, -0.0060, 0.0161, 0.0104, 0.0080, -0.0015, -0.0116, -0.0160, 0.0247, -0.0227, 0.0077, 0.0052, 0.0217, 0.0111, -0.0036, -0.0176, -0.0188, 0.0026, -0.0163, 0.0155], device='cuda:0')) ('student_p: ', Parameter containing: tensor([-0.0322, -0.0153, 0.0206, -0.0212, -0.0274, 0.0293, 0.0225, -0.0279, -0.0272, -0.0282, -0.0272, -0.0261, 0.0275, 0.0261, 0.0274, -0.0251, 0.0014, -0.0285, 0.0296, -0.0296, 0.0105, -0.0209, 0.0123, 0.0227, -0.0162, -0.0081, -0.0079, -0.0233, -0.0145, 0.0030], device='cuda:0', requires_grad=True)) ('(after) teacher_p: ', Parameter containing: tensor([ 0.0007, -0.0006, 0.0046, -0.0033, 0.0004, 0.0262, 0.0153, -0.0259, -0.0115, -0.0016, -0.0117, -0.0060, 0.0161, 0.0104, 0.0080, -0.0015, -0.0116, -0.0160, 0.0247, -0.0227, 0.0077, 0.0052, 0.0217, 0.0111, -0.0036, -0.0176, -0.0187, 0.0026, -0.0163, 0.0155], device='cuda:0'))

YilinLiu97 avatar Feb 14 '19 16:02 YilinLiu97

Is the implementation wrong?

YilinLiu97 avatar Feb 14 '19 22:02 YilinLiu97

Those two add_ lines are equivalent, aren’t they?

https://pytorch.org/docs/stable/torch.html#torch.add

tarvaina avatar Feb 15 '19 06:02 tarvaina

I realise that alpha is 0 at the beginning, as alpha = min(1 - 1 / (global_step + 1), 0.9), hence no update on the teacher at the beginning. The code is different from what is stated in the paper. A correct coding for the paper should be alpha = max(1 - 1 / (global_step + 1), 0.9)

XiaoYunZhou27 avatar Feb 02 '21 23:02 XiaoYunZhou27