vision-and-language topic
RaNet
source code of our RaNet in EMNLP 2021
pytorch_sscr
A PyTorch implementation of SSCR
HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
FactualSceneGraph
FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.
x-lxmert
PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"
PartGlot
Official Implementation of PartGlot (CVPR 2022 Oral)
lang2seg
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019
TSGV-Learning-List
Temporal Sentence Grounding in Videos / Natural Language Video Localization / Video Moment Retrieval的相关工作
GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
MGPN
source code of our MGPN in SIGIR 2022