large-multimodal-models topic
awesome-vision-time-series
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
LLaVA-STF
The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"
awesome-personalized-lmms
A curated list of Awesome Personalized Large Multimodal Models resources
Stream-Omni
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
Modality-Integration-Rate
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents