video-language topics

UniVL

330

Stars

54

Forks

Watchers

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

microsoft

alignment

caption

caption-task

coin

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...

junchen14

efficiency-transformer

image-transformer

language

mlp-mixer

all-in-one

273

Stars

16

Forks

Watchers

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

showlab

codebase

pre-training

pytorch

video-language

ReferFormer

310

Stars

26

Forks

Watchers

[CVPR2022] Official Implementation of ReferFormer

wjn922

referring-video-object-segmentation

video-language

ALPRO

184

Stars

18

Forks

Watchers

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

salesforce

prompt-learning

representation-learning

video-language

video-question-answering

EgoVLP

208

Stars

19

Forks

Watchers

[NeurIPS2022] Egocentric Video-Language Pretraining

showlab

egocentric-vision

pretraining

pytorch

video-language

Region_Learner

42

Stars

4

Forks

Watchers

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

showlab

video-language

VidIL

112

Stars

1

Forks

Watchers

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

MikeWangWZHL

blip

clip

gpt-3

msrvtt

Perceiver_VL

32

Stars

3

Forks

Watchers

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

zinengtang

efficiency

retrieval

scalability

video-language

VidSitu

56

Stars

8

Forks

Watchers

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

TheShadow29

captioning

captioning-videos

event-relations

grounding