ACRM-for-moment-retrieval
ACRM-for-moment-retrieval copied to clipboard
how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code?
Hello, I have two questions:
- in a bach ,How do you load video features of different lengths into a tensor? padding them to a max length? if so, what is the max length?
- Does the frame in the paper correspond to a clip of the video in the code , that is, corresponds to a continuous video frame ?
Hi. 1. the function rnns.pad_sequence in line 27 of ./data/collate_batch.py can realize this demand (see def pad_sequence in ./utils/rnns.py for detail). Actually, it is realized by nn.utils.rnn.pad_sequence, a official func of pytorch. For a batch data, this func could pad zero to all tensors, so that the length of all tensor will be the same as the longest ones in this batch.
2.yes, the frame in the paper is actually corresponding to a continuous video frame. It is commonly used in the VML task, since the C3D or I3D extractors are always adopted to encode the video at first, which embeds 8/16 continuous frame to a feature vector.