visual-instruction-tuning topic

List visual-instruction-tuning repositories

Awesome-Multimodal-Large-Language-Models

11.9k
Stars
765
Forks
Watchers

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Osprey

756
Stars
43
Forks
Watchers

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

polite-flamingo

63
Stars
3
Forks
Watchers

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

VideoTGB

22
Stars
1
Forks
Watchers

[EMNLP 2024] A Video Chat Agent with Temporal Prior

DataOptim

74
Stars
3
Forks
Watchers

A collection of visual instruction tuning datasets.

lmms-finetune

357
Stars
41
Forks
357
Watchers

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

LLaVA-Mini

546
Stars
28
Forks
546
Watchers

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.