vision-language-learning topics

OPERA

265

Stars

24

Forks

Watchers

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

shikiw

chatbot

chatgpt

gpt-4

large-multimodal-models

Ovis

1.4k

Stars

83

Forks

1.4k

Watchers

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

AIDC-AI

chatbot

llama3

multimodal

multimodal-large-language-models

Situation3D

17

Stars

1

Forks

Watchers

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

YunzeMan

3d-scene-understanding

deep-learning

multi-modal-learning

multimodal-learning

RLAIF-V

427

Stars

19

Forks

427

Watchers

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

RLHF-V

chatbot

gpt-4v

llava

llava-next

Modality-Integration-Rate

107

Stars

2

Forks

107

Watchers

[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

shikiw

chatbot

gpt-4o

large-multimodal-models

llama