llava topic

List llava repositories

LLaVA

17.1k

Stars

1.8k

Forks

135

Watchers

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

multimodal-maestro

963

Stars

68

Forks

Watchers

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

instance-segmentation

awesome-foundation-and-multimodal-models

520

Stars

39

Forks

Watchers

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

computer-vision

foundational-models

LRV-Instruction

225

Stars

14

Forks

Watchers

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation-metrics

foundation-models

HallusionBench

185

Stars

2

Forks

Watchers

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Video-ChatGPT

983

Stars

87

Forks

Watchers

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for...

VLMEvalKit

501

Stars

59

Forks

Watchers

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

large-language-models

LLaVAR

240

Stars

11

Forks

Watchers

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

instruction-tuning

ViP-LLaVA

173

Stars

10

Forks

Watchers

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

WisconsinAIVision

foundation-models

llava-docker

68

Stars

12

Forks

Watchers

Docker image for LLaVA: Large Language and Vision Assistant

ashleykleynhans