mllm topic
Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
MIKO
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
Bunny
A family of lightweight multimodal models.
ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating