vision-language topic

List vision-language repositories

QualiCLIP

116
Stars
3
Forks
116
Watchers

Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment

MSQNet

16
Stars
0
Forks
Watchers

Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]

SPEC

28
Stars
0
Forks
Watchers

[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

LLaVA-pp

845
Stars
61
Forks
845
Watchers

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

PMA-Net

16
Stars
2
Forks
Watchers

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

VideoGPT-plus

291
Stars
20
Forks
291
Watchers

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

rscir

78
Stars
2
Forks
Watchers

Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"

Qwen2-VL-Finetune

50
Stars
5
Forks
Watchers

An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.

VLM-Visualizer

259
Stars
21
Forks
259
Watchers

Visualizing the attention of vision-language models

lmms-finetune

357
Stars
41
Forks
357
Watchers

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.