large-multimodal-models topics

lmms-finetune

357

Stars

41

Forks

357

Watchers

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

zjysteven

finetuning

foundation-models

instruction-tuning

large-language-model

LLaVA-UHD-Better

31

Stars

3

Forks

Watchers

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

ParadoxZW

large-language-models

large-multimodal-models

llava

multimodal

VITA

801

Stars

41

Forks

Watchers

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

VITA-MLLM

large-multimodal-models

multimodal-large-language-models

MixEval

219

Stars

32

Forks

Watchers

The official evaluation suite and dynamic data release for MixEval.

Psycoy

benchmark

benchmark-mixture

benchmarking-framework

benchmarking-suite

Multi-Modal-Large-Language-Learning

22

Stars

0

Forks

Watchers

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

zchoi

awesome

benchmark

foundation-models

large-language-models

MMRole

22

Stars

1

Forks

Watchers

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

YanqiDai

large-multimodal-models

role-playing

apiprompting

106

Stars

6

Forks

106

Watchers

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

yu-rp

large-multimodal-models

large-vision-language-model

large-vision-language-models

prompting

MMMA_Rationality

15

Stars

0

Forks

Watchers

This is the official repository of the paper "Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey"

bowen-upenn

agents

foundation-models

large-language-models

large-multimodal-models

reverse_vlm

30

Stars

3

Forks

Watchers

🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"

tsunghan-wu

large-multimodal-models

vision-language-model

visual-hallucination

visual-question-answering

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...

mbzuai-oryx

foundation-models

grounding-llms

large-multimodal-models

large-vision-language-models