large-multimodal-models topic

List large-multimodal-models repositories

lmms-finetune

357
Stars
41
Forks
357
Watchers

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

LLaVA-UHD-Better

31
Stars
3
Forks
Watchers

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

VITA

801
Stars
41
Forks
Watchers

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

MixEval

219
Stars
32
Forks
Watchers

The official evaluation suite and dynamic data release for MixEval.

Multi-Modal-Large-Language-Learning

22
Stars
0
Forks
Watchers

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

MMRole

22
Stars
1
Forks
Watchers

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

apiprompting

106
Stars
6
Forks
106
Watchers

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

MMMA_Rationality

15
Stars
0
Forks
Watchers

This is the official repository of the paper "Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey"

reverse_vlm

30
Stars
3
Forks
Watchers

🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"

GeoPixel

129
Stars
15
Forks
129
Watchers

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...