video-language topic
UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...
all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
ReferFormer
[CVPR2022] Official Implementation of ReferFormer
ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
EgoVLP
[NeurIPS2022] Egocentric Video-Language Pretraining
Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)