athena
athena copied to clipboard
how to select scale of position encoding ,when use scale,when not use
i read the positon encoding code found that
def call(self, x): """ call function """ seq_len = tf.shape(x)[1] if self.scale: x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32)) x += self.pos_encoding[:, :seq_len, :] return x
my question is when to use the scale ,when not use ? is there any experimental result or theory to direct the seleciton?