transformer icon indicating copy to clipboard operation
transformer copied to clipboard

A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"

Results 18 transformer issues
Sort by recently updated
recently updated
newest added

1. a dropout between two FC in FFN 2. In the embedding layers, you should multiply those weights by sqrt(d_model). ![image](https://user-images.githubusercontent.com/13014170/131800776-5bfad1f0-2e4e-4562-852d-0538e810a9f9.png)

Hi. Please i would like to add some features to this code that i have read on beam search. Like coverage penalty and length normalization. But i don't know where...

The test set is useless and there are lots of bugs....

The error "ValueError: only one element tensors can be converted to Python scalars" occurred in L79: input_pos = tensor([list(range(1, len+1)) + [0]*(max_len-len) for len in input_len]) in modules.py. I want...

''' train.py line 104 enc_inputs, enc_inputs_len = batch.src dec_, dec_inputs_len = batch.trg dec_inputs = dec_[:, :-1] dec_targets = dec_[:, 1:] dec_inputs_len = dec_inputs_len - 1 ''' In the original paper...

python3 train.py -model_path models -data_path models/preprocess-train.t7 Namespace(batch_size=128, d_ff=2048, d_k=64, d_model=512, d_v=64, data_path='models/preprocess-train.t7', display_freq=100, dropout=0.1, log=None, lr=0.0002, max_epochs=10, max_grad_norm=None, max_src_seq_len=50, max_tgt_seq_len=50, model_path='models', n_heads=8, n_layers=6, n_warmup_steps=4000, share_embs_weight=False, share_proj_weight=False, weighted_model=False) Loading training and...