vision-language topic

List vision-language repositories

mix-generation

107
Stars
5
Forks
Watchers

MixGen: A New Multi-Modal Data Augmentation

WaffleCLIP

48
Stars
4
Forks
Watchers

Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts"

ARP

31
Stars
1
Forks
Watchers

Guide Your Agent with Adaptive Multimodal Rewards (NeurIPS 2023 Accepted)

OpenFusion

102
Stars
8
Forks
Watchers

[ICRA 2024 Oral] Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

SOONet

19
Stars
2
Forks
Watchers

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

HQGA

29
Stars
3
Forks
Watchers

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

Shot2Story

92
Stars
6
Forks
Watchers

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

debias-vision-lang

25
Stars
4
Forks
Watchers

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning [AACL 2022]

BagFormer

113
Stars
33
Forks
Watchers

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

PoS-subspaces

28
Stars
2
Forks
Watchers

[NeurIPS'23] Parts of Speech–Grounded Subspaces in Vision-Language Models