mllm topics

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

gokayfem

comfyui

custom-nodes

image-captioning

img2sfx

mPLUG-DocOwl

1.0k

Stars

58

Forks

Watchers

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

X-PLUG

chart-understanding

document-understanding

mllm

multimodal

Groma

404

Stars

50

Forks

Watchers

Grounded Multimodal Large Language Model with Localized Visual Tokenization

FoundationVision

foundation-models

grounding

large-language-models

llama

VisualWebBench

34

Stars

0

Forks

Watchers

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Youku-mPLUG

262

Stars

11

Forks

Watchers

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

mllm

mPLUG-HalOwl

59

Stars

1

Forks

Watchers

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

mllm