dress-code Warper and HPE training hyperparams

Warper and HPE training hyperparams

Open ifeherva opened this issue 2 years ago • 0 comments

I noticed that the learning for both warper and hpe were fixed during training (100k/50k) iterations according to the paper. With these params I see the loss stagnating after a few thousand iterations.

Have you tried LR decay or different params? Why did you use such a high number of iterations, did you have steady decrease of loss over 50k iterations?

Dec 27 '22 19:12 ifeherva

dress-code dress-code copied to clipboard

Warper and HPE training hyperparams

dress-code
dress-code copied to clipboard