vision-and-language topic
pytorch_ldast
A PyTorch implementation of LDAST
VLDet
[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
VisualNews-Repository
[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning
sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
PointLLM
[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
LLaVAR
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
hulc2
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data