Feedback about NLP From Scratch: Translation with a Sequence to Sequence Network and Attention

Open pahujam opened this issue 2 months ago • 0 comments

There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

isn't this code little problematic batch wise ? basically GRU in encoder gives latest hidden state which could be hidden state of PAD token ? also, the CE loss should not be computed for PAD token ?

Nov 10 '25 10:11 pahujam