vision-and-language topic

List vision-and-language repositories

VLDet

179
Stars
11
Forks
Watchers

[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

LRV-Instruction

249
Stars
13
Forks
Watchers

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Perceiver_VL

32
Stars
3
Forks
Watchers

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

VisualNews-Repository

83
Stars
9
Forks
Watchers

[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning

sugar-crepe

59
Stars
7
Forks
Watchers

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

PointLLM

538
Stars
24
Forks
Watchers

[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds

LLaVAR

254
Stars
12
Forks
Watchers

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

VidSitu

56
Stars
8
Forks
Watchers

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

hulc2

30
Stars
2
Forks
Watchers

[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data