vision-and-language topic
cyclical-visual-captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
regretful-agent
PyTorch code for CVPR 2019 paper: The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
selfmonitoring-agent
PyTorch code for ICLR 2019 paper: Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
clip_playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos