multimodal-large-language-models topic

List multimodal-large-language-models repositories

LLaVA-Mini

546
Stars
28
Forks
546
Watchers

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

VideoChat

1.2k
Stars
148
Forks
1.2k
Watchers

实时语音交互数字人,支持语音端到端和级联方案。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice...

vlm

15
Stars
2
Forks
15
Watchers

Composition of Multimodal Language Models From Scratch

ALM-Bench

45
Stars
3
Forks
45
Watchers

[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.