seq2seq issues

About overfitting

1

I tried the code yesterday, after 100 epochs, the training error almost went down to zero, yet the test error is 7.23, rendering the model almost useless. Early stop won't...

Linao1996

A problem with loss computation.

1

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad) The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So...

yxdr

A question about the nn.Embedding

Thank you for sharing this project code, and I have a question for nn.Embedding. In this project, the shape of `src` and `trg` is (maxLen, batch size). The forward of...

zhang-qiang-github

Fix incorrect argument description

CirQ

Why using relu to compute additaive attention

## 1、Attention's formula - In Normal Additive version, the attention score as follow: ``` score = v * tanh(W * [hidden; encoder_outputs]) ``` - In your code ``` score =...

yuboona

don't have the inference mode?

Does this model include the inference mode? All I can see in function"forward" requires the target sentence?

ArnoldLIULJ

about the way to calculate attention weight

2

It seems that the way to calculate attention weight is different from origin paper: softmax(v* tanh(W*[s,h])), relu are used after softmax here, can you give some reasons or reference? `...

FreyWang

What's the exact Pytorch and Torchtext version for your code? I am trying to downgrade to a previous version in order to avoid the Multi30k.split() problem but failed.

5

What's the exact Pytorch and Torchtext version for your code? I am trying to downgrade to a previous version in order to avoid the Multi30k.split() problem but failed.

yaoyiran