vision-and-language topic
pacscore
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023
pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
ETPNav
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
ShowAnything