large-multimodal-models topic
VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
ShareGPT4Video
[NeurIPS 2024 D&B Track] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
vhs_benchmark
🔥 Official Benchmark Toolkits for "Visual Haystacks: Answering Harder Questions About Sets of Images"