Issues with se2seq tutorial (batch training)

Open gavril0 opened this issue 1 year ago • 0 comments

Add Link

Link to the tutorial:

https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Describe the bug

The tutorial was markedly changed in June 2023, see commit 6c03bb3bbe17100a3b45e0c92c564911e24ab796 which aimed at fixing the implementation of attention among other things (#2468). In doing so, several other things have been changed:

adding dataloader which returns a batch of zero-padded sequences to train the network
the foward() function of the Decoder process input one word at the time in parallel for all sentences in the batch until MAX_LENGTH is reached.

I am not a torch expert but I think that the embedding layers in the encoder and decoder should have been modified to recognize padding (padding_idx=0 is missing). Using zero-padded sequence as input might also have other implications during learning but I am not sure. Can you confirm that the implementation is correct?

As a result of these change, the text does not describe well the code. I think that it would be nice to include a discussion of zero-padding and the implications of using batches on the code in the tutorial. I am also curious if there is really a gain in using a batch since most sentences are short.

Finally, I found a mention in the text about using teacher_forcing_ratio which is not included in the code. The tutorial or the code need to be adjusted.

If this is useful, I found another implementation of the same tutorial which seems to be a fork from a previous version (it was archived in 2021):

It does not does not use batches
It includes teacher_forcing_ratio to select the amount of forced teaching
It implements both Luong et al and Bahdanau et al. models of attention

Describe your environment

I appreciate this tutorial as it provides a simple introduction to Seq2Seq models with a small dataset. I am actually trying to port this tutorial in R with torch package.

cc @albanD

Apr 18 '24 19:04 gavril0