practical-pytorch Deep transition RNNs/Stacked RNNs

Deep transition RNNs/Stacked RNNs

Open olofmogren opened this issue 7 years ago • 1 comments

Hi, I'm looking at your tutorial for Translation with a Sequence to Sequence Network and Attention. n_layers is the depth of your RNN. Thank you for a well-written and easy-to-follow tutorial. I have a couple of questions.

You are applying the RNN cell in a loop (for i in range(self.n_layers)) where the hidden state is fed from one layer to the next. According to https://arxiv.org/pdf/1312.6026.pdf, this is known as Deep Transition (DT) RNN. There is also something called stacked RNN in this paper, which is what I previously have referred to as Deep RNN. Would it be a good idea to clarify the differences to avoid confusion?

Also, I find it a bit confusing with hardcoded batch-size of 1. Is there a good reason not to mention batching? It doesn't make the code much more difficult to read.

Olof

Sep 25 '17 12:09 olofmogren

@olofmogren I was confuse about the loops of gru, too, when I checked the official tutorial in http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html. The tutorial in practical-pytorch (use nn.GRU which accepts the n_layer parameter, no loops of gru anymore) now is different from the official tutorial, I think maybe it is not intend to use the "transition" in pytorch's official doc, it maybe just a bug.....^_^. It's my opinion.

Nov 23 '17 14:11 IdiosyncraticDragon

practical-pytorch practical-pytorch copied to clipboard

Deep transition RNNs/Stacked RNNs

practical-pytorch
practical-pytorch copied to clipboard