multimodal-large-language-models topic
List
multimodal-large-language-models repositories
LLaVA-Mini
546
Stars
28
Forks
546
Watchers
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
VideoChat
1.2k
Stars
148
Forks
1.2k
Watchers
实时语音交互数字人,支持语音端到端和级联方案。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice...
vlm
15
Stars
2
Forks
15
Watchers
Composition of Multimodal Language Models From Scratch
ALM-Bench
45
Stars
3
Forks
45
Watchers
[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.