transformer-tensorflow
transformer-tensorflow copied to clipboard
TensorFlow implementation of 'Attention Is All You Need (2017. 6)'
in the model.py 49line why: self.decoder_inputs = tf.concat([start_tokens, target_slice_last_1], axis=1) self.decoder_inputs is word_id why concat a zeros matrix
blue -> bleu
Does this support beam search?
In the paper, it says: PE(pos,2i)=sin(pos/10000 ** (2i/dmodel)) PE(pos,2i+1)=cos(pos/10000 ** (2i/dmode)l) So basically for dim i, the denominator should be 10000 ** (2 * (i//2)/dmodel). I rewrite the function as:...
Hi Dongjun, In line 38, of the Graph class, the following loop continues until max sequence length is decoded. for i in range(2, Config.data.max_seq_length): Is it possible to break the...