seq2seq
seq2seq copied to clipboard
Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
I tried the code yesterday, after 100 epochs, the training error almost went down to zero, yet the test error is 7.23, rendering the model almost useless. Early stop won't...
loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad) The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So...
Thank you for sharing this project code, and I have a question for nn.Embedding. In this project, the shape of `src` and `trg` is (maxLen, batch size). The forward of...
## 1、Attention's formula - In Normal Additive version, the attention score as follow: ``` score = v * tanh(W * [hidden; encoder_outputs]) ``` - In your code ``` score =...
Does this model include the inference mode? All I can see in function"forward" requires the target sentence?
It seems that the way to calculate attention weight is different from origin paper: softmax(v* tanh(W*[s,h])), relu are used after softmax here, can you give some reasons or reference? `...
What's the exact Pytorch and Torchtext version for your code? I am trying to downgrade to a previous version in order to avoid the Multi30k.split() problem but failed.
what is the "encoder_outputs",would you tell me more specific