transformer Question on noam_scheme(): learning

Question on noam_scheme(): learning_rate

Open luckysofia opened this issue 6 years ago • 2 comments

trafficstars

There is a question: noam_scheme(): return init_lr * warmup_steps ** 0.5 * tf.minimum( step * warmup_steps ** -1.5, step ** -0.5) why do you use 'init_lr * warmup_steps ** 0.5' rather than 'd_model ** -0.5' ? Thanks a lot! @Kyubyong

Mar 15 '19 08:03 luckysofia

@Kyubyong

Mar 27 '19 08:03 luckysofia

same doubt with you, maybe it should alter to d_model ** -0.5, anyway, I will conduct experiments of these two methods.

Jun 06 '19 08:06 alphadl

transformer transformer copied to clipboard

Question on noam_scheme(): learning_rate

transformer
transformer copied to clipboard