vision-language topics

shine

38

Stars

2

Forks

Watchers

[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

naver

open-vocabulary-detection

vision-language

VLM-Captioning-Tools

44

Stars

0

Forks

44

Watchers

Python scripts to use for captioning images with VLMs

ProGamerGov

cogvlm

image-captioning

llama3

llm

ORacle

22

Stars

0

Forks

Watchers

Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.

egeozsoy

deep-learning

knowledge

large-language-model

llm

AutoConverter

28

Stars

2

Forks

Watchers

Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)

yuhui-zh15

computer-vision

machine-learning

natural-language-processing

vision-language

grove

25

Stars

0

Forks

25

Watchers

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)

ekazakos

automatic-annotation

large-scale-pretraining

video-captioning

video-grounding

Llama3.2-Vision-Finetune

172

Stars

25

Forks

172

Watchers

An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.

2U1

llama3

multi-modal

vision-language

vision-language-model

vision-ai-checkup

42

Stars

11

Forks

42

Watchers

Take your LLM to the optometrist.

roboflow

llm

llm-benchmarking

vision-language

vision-language-model

Cross-the-Gap

56

Stars

1

Forks

56

Watchers

[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

miccunifi

clip

contrastive-learning

iclr2025

image-classification

HALVA

17

Stars

0

Forks

17

Watchers

[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

pritamqu

hallucination-mitigation

multimodal-large-language-models

vision-language