transformer
transformer copied to clipboard
Question on noam_scheme(): learning_rate
trafficstars
There is a question: noam_scheme(): return init_lr * warmup_steps ** 0.5 * tf.minimum( step * warmup_steps ** -1.5, step ** -0.5) why do you use 'init_lr * warmup_steps ** 0.5' rather than 'd_model ** -0.5' ? Thanks a lot! @Kyubyong
@Kyubyong
same doubt with you, maybe it should alter to d_model ** -0.5, anyway, I will conduct experiments of these two methods.