tutorials
tutorials copied to clipboard
Feedback about NLP From Scratch: Translation with a Sequence to Sequence Network and Attention
There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
isn't this code little problematic batch wise ? basically GRU in encoder gives latest hidden state which could be hidden state of PAD token ? also, the CE loss should not be computed for PAD token ?