msrvtt topic
video-captioning-models-in-Pytorch
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Semantics-AssistedVideoCaptioning
Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
MAC
An end-to-end masked contrastive video-and-language pre-training framework
X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"