ACRM-for-moment-retrieval how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code?

how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code?

Open TAY-985 opened this issue 3 years ago • 1 comments

Hello, I have two questions:

in a bach ,How do you load video features of different lengths into a tensor? padding them to a max length? if so, what is the max length?
Does the frame in the paper correspond to a clip of the video in the code , that is, corresponds to a continuous video frame ?

Nov 09 '21 01:11 TAY-985

Hi. 1. the function rnns.pad_sequence in line 27 of ./data/collate_batch.py can realize this demand (see def pad_sequence in ./utils/rnns.py for detail). Actually, it is realized by nn.utils.rnn.pad_sequence, a official func of pytorch. For a batch data, this func could pad zero to all tensors, so that the length of all tensor will be the same as the longest ones in this batch.

2.yes, the frame in the paper is actually corresponding to a continuous video frame. It is commonly used in the VML task, since the C3D or I3D extractors are always adopted to encode the video at first, which embeds 8/16 continuous frame to a feature vector.

Dec 03 '21 06:12 tanghaoyu258

ACRM-for-moment-retrieval ACRM-for-moment-retrieval copied to clipboard

how does video of different lengths into tensor? and word "frame " in paper correspond to a clip of the video in the code?

ACRM-for-moment-retrieval
ACRM-for-moment-retrieval copied to clipboard