vision-and-language topic

List vision-and-language repositories

video_captioning_datasets

110
Stars
12
Forks
Watchers

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

clevr-dialog

44
Stars
2
Forks
Watchers

Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog

FrozenBiLM

144
Stars
23
Forks
Watchers

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

rosita

55
Stars
13
Forks
Watchers

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

IAIS

30
Stars
4
Forks
Watchers

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

hulc

58
Stars
9
Forks
Watchers

Hierarchical Universal Language Conditioned Policies

Explore-And-Match

42
Stars
2
Forks
Watchers

Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos"

robo-vln

66
Stars
8
Forks
Watchers

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Xmodal-Ctx

60
Stars
10
Forks
Watchers

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

X2-VLM

115
Stars
9
Forks
Watchers

All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)