transformer
transformer copied to clipboard
A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"
......?
how to use it ?
1. a dropout between two FC in FFN 2. In the embedding layers, you should multiply those weights by sqrt(d_model). 
Hi. Please i would like to add some features to this code that i have read on beam search. Like coverage penalty and length normalization. But i don't know where...
The test set is useless and there are lots of bugs....
The error "ValueError: only one element tensors can be converted to Python scalars" occurred in L79: input_pos = tensor([list(range(1, len+1)) + [0]*(max_len-len) for len in input_len]) in modules.py. I want...
''' train.py line 104 enc_inputs, enc_inputs_len = batch.src dec_, dec_inputs_len = batch.trg dec_inputs = dec_[:, :-1] dec_targets = dec_[:, 1:] dec_inputs_len = dec_inputs_len - 1 ''' In the original paper...
python3 train.py -model_path models -data_path models/preprocess-train.t7 Namespace(batch_size=128, d_ff=2048, d_k=64, d_model=512, d_v=64, data_path='models/preprocess-train.t7', display_freq=100, dropout=0.1, log=None, lr=0.0002, max_epochs=10, max_grad_norm=None, max_src_seq_len=50, max_tgt_seq_len=50, model_path='models', n_heads=8, n_layers=6, n_warmup_steps=4000, share_embs_weight=False, share_proj_weight=False, weighted_model=False) Loading training and...