vision-and-language topic
TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
DallEval
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
Oscar
Oscar and VinVL
xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
ALBEF
Code for ALBEF: a new vision-language pre-training method
Proctoring-AI
Creating a software for automatic monitoring in online proctoring
VL-BERT
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".