vision-language topic
VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Active_VLN
The repository of ECCV 2020 paper `Active Visual Information Gathering for Vision-Language Navigation`
NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
VaLM
VaLM: Visually-augmented Language Modeling. ICLR 2023.
rtic-gcn-pytorch
Official PyTorch Implementation of RITC
PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
S2-Transformer
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022