video_to_sequence
video_to_sequence copied to clipboard
Question about n_lstm_steps
Hi! Thank you for your excellent work! I learn a lot from your implementation. But there is a little question about n_lstm_steps. You make the encode and decode stage have the same n_lstm_steps, so each stage will expand 80 time steps. However, the original paper says that
"... we unroll the LSTM to a fixed 80 time steps during training. ... to ensure that the sum of the number of frames and words is within this limit."
So I am curious about the difference between this two strategy will change the model heavily or not?
Besides, I am also want to know using zeros to pad the video frame will introduce some error or not? In my opinion, the zero-pad video frames will affect the hidden state of encode LSTM, so the encoded information will include some noises. Is it right?
Thanks!