llava topic

List llava repositories

LLaVA

19.5k
Stars
2.1k
Forks
Watchers

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

maestro

2.7k
Stars
220
Forks
2.7k
Watchers

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

awesome-foundation-and-multimodal-models

565
Stars
43
Forks
Watchers

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

LRV-Instruction

249
Stars
13
Forks
Watchers

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

HallusionBench

228
Stars
5
Forks
Watchers

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Video-ChatGPT

1.2k
Stars
102
Forks
Watchers

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for...

VLMEvalKit

1.1k
Stars
157
Forks
Watchers

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

LLaVAR

254
Stars
12
Forks
Watchers

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

ViP-LLaVA

292
Stars
22
Forks
Watchers

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

llava-docker

68
Stars
12
Forks
Watchers

Docker image for LLaVA: Large Language and Vision Assistant