multimodal-large-language-models topics

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...

joanrod

llm

multimodal-large-language-models

svg

vlm

GenHancer

73

Stars

1

Forks

Watchers

(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.

mashijie1028

generative-models

multimodal-large-language-models

visual-representation-learning

SAR3D

180

Stars

4

Forks

Watchers

Official repository for "SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE"

cyw-3d

3d-understanding

multimodal-large-language-models

single-image-to-3d

text-to-3d

OmniVerifier

34

Stars

3

Forks

Watchers

Generative Universal Verifier as Multimodal Meta-Reasoner

Cominclip

llm-as-a-judge

multimodal-large-language-models

multimodal-reasoning

vision-language-model

HoliTom

57

Stars

1

Forks

57

Watchers

[NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models

cokeshao

large-language-models

llava

llava-next-video

multimodal-large-language-models

LSDBench

23

Stars

0

Forks

23

Watchers

A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs. (ICCV2025)

dvlab-research

benchmark

long-video-understanding

multimodal-large-language-models

reasoning-agent