large-multimodal-models topic
lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
MixEval
The official evaluation suite and dynamic data release for MixEval.
Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
MMRole
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
MMMA_Rationality
This is the official repository of the paper "Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey"
reverse_vlm
🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"
GeoPixel
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...