large-vision-language-models topic
Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
talk2bev
Talk2BEV: Language-Enhanced Bird's Eye View Maps (Accepted to ICRA'24)
CHOCOLATE
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.