vision-language-learning topic

List vision-language-learning repositories

OPERA

265
Stars
24
Forks
Watchers

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Ovis

1.4k
Stars
83
Forks
1.4k
Watchers

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Situation3D

17
Stars
1
Forks
Watchers

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

RLAIF-V

427
Stars
19
Forks
427
Watchers

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Modality-Integration-Rate

107
Stars
2
Forks
107
Watchers

[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".