vision-and-language topic

List vision-and-language repositories

Multimodal-GPT

1.5k
Stars
123
Forks
Watchers

Multimodal-GPT

pacscore

51
Stars
4
Forks
Watchers

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023

pytorch_empirical-mvm

39
Stars
2
Forks
Watchers

A PyTorch implementation of EmpiricalMVM

ONE-PEACE

942
Stars
59
Forks
Watchers

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

groundingLMM

744
Stars
37
Forks
Watchers

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

AlphaCLIP

651
Stars
38
Forks
Watchers

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

ETPNav

193
Stars
18
Forks
Watchers

[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"

VL-CheckList

121
Stars
4
Forks
Watchers

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.