vision-and-language topic
vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
awesome-vision-language-navigation
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"
rva
Code for CVPR'19 "Recursive Visual Attention in Visual Dialog"
awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)