lmm topic

List lmm repositories

maestro

2.6k
Stars
219
Forks
2.6k
Watchers

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

HallusionBench

228
Stars
5
Forks
Watchers

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Video-LLaVA

238
Stars
11
Forks
Watchers

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

groundingLMM

744
Stars
37
Forks
Watchers

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

InternLM-XComposer

2.5k
Stars
152
Forks
Watchers

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

LLaVA-Interactive-Demo

340
Stars
25
Forks
Watchers

LLaVA-Interactive-Demo

awesome-Large-MultiModal-Hallucination

135
Stars
11
Forks
Watchers

😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.

Cradle

2.3k
Stars
226
Forks
2.3k
Watchers

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation,...

LLaVA-CLI-with-multiple-images

45
Stars
4
Forks
Watchers

LLaVA inference with multiple images at once for cross-image analysis.