multimodal-large-language-models topic
Awesome-MLLM-LLM-Colab
Happy experimenting with MLLM and LLM models!
MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
star-vector
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...
GenHancer
(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.
SAR3D
Official repository for "SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE"
OmniVerifier
Generative Universal Verifier as Multimodal Meta-Reasoner
HoliTom
[NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models
LSDBench
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs. (ICCV2025)
SOP-LVM-ICL-Ensemble
[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding