video-caption.pytorch Is it necessary to use a vid2hid layer before the rnn cell?

Is it necessary to use a vid2hid layer before the rnn cell?

Open Nash2325138 opened this issue 6 years ago • 0 comments

https://github.com/xiadingZ/video-caption.pytorch/blob/9e4759d9a6b48a72c005bba7c3bb9c53065f1f28/models/EncoderRNN.py#L25

As the title, why do we need another linear transform layer for video features when the rnn will do it inside the cell?

If it is to save the number of parameters, will it be better if we specify the rnn input dimension using another variable? For instance:

self.vid2hid = nn.Linear(dim_vid, dim_rnn_input)
...
self.rnn = self.rnn_cell(dim_rnn_input, dim_hidden, n_layers, batch_first=True,
                         bidirectional=bidirectional, dropout=self.rnn_dropout_p)

Dec 08 '18 09:12 Nash2325138

video-caption.pytorch video-caption.pytorch copied to clipboard

Is it necessary to use a vid2hid layer before the rnn cell?

video-caption.pytorch
video-caption.pytorch copied to clipboard