pytorch-seq2seq In the encoder. Why not use pack_padded_sequence (embedded, input

In the encoder. Why not use pack_padded_sequence (embedded, input_lengths)

Open lightcome opened this issue 5 years ago • 3 comments

Whether the effect will be affected,if it's not pack_padded_sequence(embedded, input_lengths)

Mar 30 '20 02:03 lightcome

Not sure I understand the question, sorry.

Are you asking why we use pack_padded_sequence in notebook 4?

Mar 31 '20 19:03 bentrevett

I have some different question in related to pack_padded_sequence.. What is batch_sizes in output of 'pack_padded_sequence? How is it calculated?

May 30 '20 03:05 yugaljain1999

@yugaljain1999

We can try running some code to help us understand the packed sequences batching.

import torch
import torch.nn as nn

max_length = 10
batch_size = 3
emb_dim = 10

seqs = torch.zeros(max_length, batch_size, emb_dim)
lens = torch.LongTensor([10, 7, 5])

packed_seqs = nn.utils.rnn.pack_padded_sequence(seqs, lens)

print(packed_seqs.batch_sizes)
# prints: tensor([3, 3, 3, 3, 3, 2, 2, 1, 1, 1])

This is an example of if we had a batch of three sequences, one of length 10, one of length 7 and one of length 5. The tensor needs a length of 10 as this is the length of the longest sequence, the other two sequences must be padded until they are length 10.

If using packed sequences, we must pass the lengths of the sequences to pack_padded_sequence. Here we pass [10, 7, 5], as a tensor. We can then check the "batch sizes" of this tensor and see that we get [3, 3, 3, 3, 3, 2, 2, 1, 1, 1].

From this we can see that the first 5 elements of the packed sequence have a batch size of 3. This is because all of our sequences are at least length 5. The 6th and 7th elements have a batch size of 2 - this is because we no longer need to feed in the sequence with length 5 as we've gone beyond it's length. The last 3 elements have a batch size of 1, as we only have one sequence with a length greater than 7.

Jun 01 '20 11:06 bentrevett

pytorch-seq2seq pytorch-seq2seq copied to clipboard

In the encoder. Why not use pack_padded_sequence (embedded, input_lengths)

pytorch-seq2seq
pytorch-seq2seq copied to clipboard