pytorch-seq2seq icon indicating copy to clipboard operation
pytorch-seq2seq copied to clipboard

dimensionality error on GPU for changed number of encoder layers

Open dieuwkehupkes opened this issue 7 years ago • 3 comments

I am encountering a strange problem when I try to change the number of layers in the encoder: when I run this on a cpu, it runs without problems, when I call the exact same script on a gpu, however it gives me a dimensionality error. The only thing I changed is the call of the decoder in the sample.py script:

encoder = EncoderRNN(len(src.vocab), max_len, hidden_size, n_layers=2, bidirectional=bidirectional, variable_lengths=True)

Which results in the following error when forward is called:

File "/home/dhupkes/.local/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 266, in forward hidden_size, tuple(hx.size()))) RuntimeError: Expected hidden size (1, 32L, 256), got (2L, 32L, 256L)

I imagined that this would be due to what is passed to the decoder, but when I started to debug on a cpu I discovered to my surprise that the error was not raised there with the exact same script.

Anyone an idea what is going on?

dieuwkehupkes avatar Nov 20 '17 16:11 dieuwkehupkes

Hi, Please include n_layers=2 in the decoder too. That seems to have fixed the issue. encoder = EncoderRNN(len(src.vocab), max_len, hidden_size, n_layers=2, bidirectional=bidirectional, variable_lengths=True) decoder = DecoderRNN(len(tgt.vocab), max_len, hidden_size * 2 if bidirectional else 1, n_layers=2, dropout_p=0.2, use_attention=True, bidirectional=bidirectional, eos_id=tgt.eos_id, sos_id=tgt.sos_id)

tejaswini avatar Nov 25 '17 15:11 tejaswini

Hey, Thanks for looking into it. Did you run this also on a GPU? Because I tried this already but it didn't solve the problem (well, no errors on the CPU, but only on GPU). Why do you think this should help? I don't think there is any theoretical reason that the encoder should have the same number of layers as the decoder (and it is in fact also not what I would like to have in my model).

dieuwkehupkes avatar Nov 25 '17 17:11 dieuwkehupkes

Hi, Turns out the discrepancy in the behaviour between CPU and GPU is a bug in pytorch. We filed a bug for the same https://github.com/pytorch/pytorch/issues/4002 Currently, we only support encoders and decoders with the same number of hidden layers. We created an issue to support what you are doing https://github.com/IBM/pytorch-seq2seq/issues/107.

tejaswini avatar Dec 04 '17 15:12 tejaswini