vision-language-model topics

groundingLMM

596

Stars

28

Forks

Watchers

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

mbzuai-oryx

foundation-models

llm-agent

lmm

vision-and-language

AlphaCLIP

520

Stars

28

Forks

Watchers

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Chat-UniVi

657

Stars

31

Forks

Watchers

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

PKU-YuanGroup

image-understanding

large-language-models

video-understanding

vision-language-model

InstructCV

512

Stars

45

Forks

Watchers

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Awesome-Multimodal-LLM

53

Stars

6

Forks

Watchers

Reading list for Multimodal Large Language Models

vincentlux

awesome-list

computer-vision

large-language-models

machine-learning

InternLM-XComposer

1.8k

Stars

118

Forks

Watchers

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

InternLM

chatgpt

foundation

gpt-4

gpt4v

multi_token

150

Stars

6

Forks

Watchers

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

sshh12

large-context

large-language-models

large-multimodal-models

llava

ProbVLM

24

Stars

2

Forks

Watchers

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

ExplainableML

bayesian-deep-learning

pytorch

uncertainty-estimation

uncertainty-quantification

HGCLIP

29

Stars

1

Forks

Watchers

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

richard-peng-xia

graph-representations

hierarchical-image-classification

multi-modal-learning

vision-language-model

LIQE

138

Stars

8

Forks

Watchers

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

zwx8981

blind-image-quality-assessment

clip

image-quality-assessment

multitask-learning