vision-and-language topic

List vision-and-language repositories
trafficstars

TubeDETR

161
Stars
8
Forks
Watchers

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

FVTA_MemexQA

33
Stars
15
Forks
Watchers

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

CLIP-Caption-Reward

227
Stars
26
Forks
Watchers

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

DallEval

135
Stars
5
Forks
Watchers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

xmodaler

1.0k
Stars
112
Forks
Watchers

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioning

269
Stars
52
Forks
Watchers

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

ALBEF

1.4k
Stars
191
Forks
Watchers

Code for ALBEF: a new vision-language pre-training method

Proctoring-AI

526
Stars
323
Forks
Watchers

Creating a software for automatic monitoring in online proctoring

VL-BERT

734
Stars
110
Forks
Watchers

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".