lmm topic
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
awesome-Large-MultiModal-Hallucination
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation,...
LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
graphist
Official Repo of Graphist