multimodal-large-language-models topic

List multimodal-large-language-models repositories

MMWorld

29
Stars
1
Forks
29
Watchers

Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

TRACE

136
Stars
3
Forks
136
Watchers

[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling

star-vector

3.1k
Stars
162
Forks
Watchers

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...

GenHancer

73
Stars
1
Forks
Watchers

(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.

SAR3D

180
Stars
4
Forks
Watchers

Official repository for "SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE"

HoliTom

57
Stars
1
Forks
57
Watchers

[NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models

LSDBench

23
Stars
0
Forks
23
Watchers

A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs. (ICCV2025)

SOP-LVM-ICL-Ensemble

23
Stars
3
Forks
23
Watchers

[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding