practical-pytorch
practical-pytorch copied to clipboard
Batch support in seq2seq tutorial
Hi, Thank you for the great work! Would you please add batching to the tutorial as well?
Hello, @spro I have been working through extending with batch. My code is here https://github.com/vijendra-rana/Random/blob/master/translation_with_batch.py . I have created some fake data for this. But the problem is I am getting error in loss saying
RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.I understand we cannot have loss being backwarded twice but I don't see anywhere that I am doing it twice. Also I have question about masking how would you mask the loss at the encoder. Not sure how to implement it the encoder o/p being size (seq_len,batch,hidden_size) and mask being (batch_size,seq_len)
Thanks in advance for help :)
I put a first version of the batched model at https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb via 31fdb61387e62948f6a24dc9a2dadd6d3221a73c
The biggest changes are using pack_padded_sequence
before the encoder RNN and pad_packed_sequence
after it, and the masked cross entropy loss from @jihunchoi after decoding. For the decoder itself changes are minor because it only runs one time step at a time.
Thanks, @spro for your effort in putting these together. Your tutorials are really nice.
Hi guys, I implemented more features based on this tutorial (e.g. batched computation for attention) and added some notes. Check out my repo here: https://github.com/howardyclo/pytorch-seq2seq-example/blob/master/seq2seq.ipynb
I noticed some implementations of batch seq2seq with attention allow for an embedded size that is different then the hidden size. Is there a reason to match the two sizes?
@spro Thanks for the nice code sample. Had some trouble, looking to get some help: I tried to run it out of the box, hit an error in this block:
max_target_length = max(target_lengths)
decoder_input = Variable(torch.LongTensor([SOS_token] * small_batch_size))
decoder_hidden = encoder_hidden[:decoder_test.n_layers] # Use last (forward) hidden state from encoder
all_decoder_outputs = Variable(torch.zeros(max_target_length, small_batch_size, decoder_test.output_size))
if USE_CUDA:
all_decoder_outputs = all_decoder_outputs.cuda()
decoder_input = decoder_input.cuda()
# Run through decoder one time step at a time
for t in range(max_target_length):
decoder_output, decoder_hidden, decoder_attn = decoder_test(
decoder_input, decoder_hidden, encoder_outputs
)
all_decoder_outputs[t] = decoder_output # Store this step's outputs
decoder_input = target_batches[t] # Next input is current target
# Test masked cross entropy loss
loss = masked_cross_entropy(
all_decoder_outputs.transpose(0, 1).contiguous(),
target_batches.transpose(0, 1).contiguous(),
target_lengths
)
print('loss', loss.data[0])
The error reads as follows:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-28-babf231e41ef> in <module>()
13 for t in range(max_target_length):
14 decoder_output, decoder_hidden, decoder_attn = decoder_test(
---> 15 decoder_input, decoder_hidden, encoder_outputs
16 )
17 all_decoder_outputs[t] = decoder_output # Store this step's outputs
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
<ipython-input-24-43d7954b3ba4> in forward(self, input_seq, last_hidden, encoder_outputs)
35 # Calculate attention from current RNN state and all encoder outputs;
36 # apply to encoder outputs to get weighted average
---> 37 attn_weights = self.attn(rnn_output, encoder_outputs)
38 context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x S=1 x N
39
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
<ipython-input-22-61485b548d0f> in forward(self, hidden, encoder_outputs)
27 # Calculate energy for each encoder output
28 for i in range(max_len):
---> 29 attn_energies[b, i] = self.score(hidden[:, b], encoder_outputs[i, b].unsqueeze(0))
30
31 # Normalize energies to weights in range 0 to 1, resize to 1 x B x S
<ipython-input-22-61485b548d0f> in score(self, hidden, encoder_output)
40 elif self.method == 'general':
41 energy = self.attn(encoder_output)
---> 42 energy = hidden.dot(energy)
43 return energy
44
RuntimeError: Expected argument self to have 1 dimension, but has 2
@suwangcompling hidden = hidden.squeeze(), encoder_output = encoder_output.squeeze() you can try it!