visual-language-learning topics

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Luodian

artificial-inteligence

chatgpt

deep-learning

foundation-models

BLIVA

264

Stars

27

Forks

Watchers

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

mlpc-ucsd

blip2

bliva

chatbot

instruction-tuning

llava-docker

68

Stars

12

Forks

Watchers

Docker image for LLaVA: Large Language and Vision Assistant

ashleykleynhans

ai

chatbot

chatgpt

docker

NExT-GPT

3.2k

Stars

319

Forks

Watchers

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

NExT-GPT

chatgpt

foundation-models

gpt-4

instruction-tuning

InternLM-XComposer

2.5k

Stars

152

Forks

Watchers

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

InternLM

chatgpt

foundation

gpt-4

gpt4v

RLHF-V

296

Stars

8

Forks

296

Watchers

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

RLHF-V

chatbot

gpt-4

llama

multi-modality

KarmaVLM

88

Stars

3

Forks

88

Watchers

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

thomas-yanxin

llama2

llava

multimodel

qwen2

Open-LLaVA-NeXT

427

Stars

22

Forks

427

Watchers

An open-source implementation for training LLaVA-NeXT.

xiaoachen98

chatbot

chatgpt

gpt-4

gpt4o

llama-multimodal-vqa

50

Stars

11

Forks

50

Watchers

Multimodal Instruction Tuning for Llama 3

AdrianBZG

chatbot

chatgpt

gpt-4

huggingface