visual-question-answering topic
easy-VQA
The Easy Visual Question Answering dataset.
hexia
Mid-level PyTorch Based Framework for Visual Question Answering.
bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
FiLM-pytorch
PyTorch implementation of FiLM: Visual Reasoning with a General Conditioning Layer
ban-vqa
Bilinear attention networks for visual question answering
just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...