multimodal-large-language-models topic
VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
EasyDetect
An Easy-to-use Hallucination Detection Framework for LLMs.
MineLand
Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
seemore
From scratch implementation of a vision language model in pure PyTorch
mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Awesome-Medical-Large-Language-Models
Curated papers on Large Language Models in Healthcare and Medical domain
polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
VideoTGB
[EMNLP 2024] A Video Chat Agent with Temporal Prior