AdaptSum icon indicating copy to clipboard operation
AdaptSum copied to clipboard

question about learning rate

Open HiXiaochen opened this issue 3 years ago • 0 comments

In 《attention is all you need》,lrate in noam decay they used is formulated as: "lrate = d−0.5 · min(step_num−0.5, step_num · warmup_steps−1.5)" But in your code,I found there is an original_lr which is 0.05: self._set_rate( self.original_lr *
( self.model_size ** -0.5 * min(self._step ** (-0.5),
self._step * self.warmup_steps**(-1.5)))) Why do we need to add this term?

HiXiaochen avatar Sep 16 '21 09:09 HiXiaochen