vision-language-model topic

List vision-language-model repositories

groundingLMM

596
Stars
28
Forks
Watchers

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

AlphaCLIP

520
Stars
28
Forks
Watchers

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Chat-UniVi

657
Stars
31
Forks
Watchers

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

InstructCV

512
Stars
45
Forks
Watchers

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

InternLM-XComposer

1.8k
Stars
118
Forks
Watchers

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

multi_token

150
Stars
6
Forks
Watchers

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

ProbVLM

24
Stars
2
Forks
Watchers

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

HGCLIP

29
Stars
1
Forks
Watchers

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

LIQE

138
Stars
8
Forks
Watchers

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective