mscoco topic
bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
coco-caption
Adds SPICE metric to coco-caption evaluation server codes
SPICE
Semantic Propositional Image Caption Evaluation
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
BMaskR-CNN
[ECCV 2020] Boundary-preserving Mask R-CNN
ml-cvnets
CVNets: A library for training computer vision networks
CoTNet
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
EdgeNets
This repository contains the source code of our work on designing efficient CNNs for computer vision
VarifocalNet
VarifocalNet: An IoU-aware Dense Object Detector