practical-pytorch
practical-pytorch copied to clipboard
Deep transition RNNs/Stacked RNNs
Hi, I'm looking at your tutorial for Translation with a Sequence to Sequence Network and Attention. n_layers is the depth of your RNN. Thank you for a well-written and easy-to-follow tutorial. I have a couple of questions.
You are applying the RNN cell in a loop (for i in range(self.n_layers)) where the hidden state is fed from one layer to the next. According to https://arxiv.org/pdf/1312.6026.pdf, this is known as Deep Transition (DT) RNN. There is also something called stacked RNN in this paper, which is what I previously have referred to as Deep RNN. Would it be a good idea to clarify the differences to avoid confusion?
Also, I find it a bit confusing with hardcoded batch-size of 1. Is there a good reason not to mention batching? It doesn't make the code much more difficult to read.
Olof
@olofmogren I was confuse about the loops of gru, too, when I checked the official tutorial in http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html. The tutorial in practical-pytorch (use nn.GRU which accepts the n_layer parameter, no loops of gru anymore) now is different from the official tutorial, I think maybe it is not intend to use the "transition" in pytorch's official doc, it maybe just a bug.....^_^. It's my opinion.