visual-language-learning topic

List visual-language-learning repositories

LLaVA

19.5k
Stars
2.1k
Forks
135
Watchers

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Otter

3.6k
Stars
242
Forks
85
Watchers

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

BLIVA

264
Stars
27
Forks
Watchers

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

llava-docker

68
Stars
12
Forks
Watchers

Docker image for LLaVA: Large Language and Vision Assistant

NExT-GPT

3.2k
Stars
319
Forks
Watchers

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

InternLM-XComposer

2.5k
Stars
152
Forks
Watchers

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

RLHF-V

219
Stars
6
Forks
Watchers

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

KarmaVLM

83
Stars
3
Forks
Watchers

🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.

Open-LLaVA-NeXT

247
Stars
10
Forks
Watchers

An open-source implementation for training LLaVA-NeXT.

llama-multimodal-vqa

33
Stars
5
Forks
Watchers

Multimodal Instruction Tuning for Llama 3