Video-guided-Machine-Translation
Video-guided-Machine-Translation copied to clipboard
[Suggestion] Support/Provide global video features
@eric-xw @zzxslp So far, each video is represented by a NumPy array of size (1, num_of_segments, 1024). Since many of the original videos are no longer available, would it be possible for you to provide a pooled/global feature for each video (size of [1, D])?
Such a pooled representation is widely used in image-guided NMT such as Multi30K, and I believe it will also benefit research in VMT.