multimodal-large-language-models topic

List multimodal-large-language-models repositories

Ovis

1.4k
Stars
83
Forks
1.4k
Watchers

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Parrot

25
Stars
1
Forks
Watchers

🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

lmms-finetune

357
Stars
41
Forks
357
Watchers

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

EVF-SAM

250
Stars
8
Forks
Watchers

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"

multimodal-chat

88
Stars
11
Forks
Watchers

A multimodal chat interface with many tools.

LLaMA-Omni

2.0k
Stars
111
Forks
Watchers

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

VITA

801
Stars
41
Forks
Watchers

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Video-MME

695
Stars
25
Forks
695
Watchers

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

MLVU

139
Stars
0
Forks
Watchers

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

ml-slowfast-llava

129
Stars
9
Forks
Watchers

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models