direct-pretraining
direct-pretraining copied to clipboard
Fixed learning rate during pretraining
Hi,
Thank you for your work. I was wondering what is the motivation for using fixed learning rate during pretraining? Is it purely from empirical results or are there any particular reasons?