Show-Attend-and-Tell
Show-Attend-and-Tell copied to clipboard
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
A PyTorch implementation
For a trained model to load into the decoder, use
Some training statistics
BLEU scores for VGG19 (Orange) and ResNet152 (Red) Trained With Teacher Forcing.
BLEU Score | Graph | Top-K Accuracy | Graph |
---|---|---|---|
BLEU-1 | ![]() |
Training Top-1 | ![]() |
BLEU-2 | ![]() |
Training Top-5 | ![]() |
BLEU-3 | ![]() |
Validation Top-1 | ![]() |
BLEU-4 | ![]() |
Validation Top-5 | ![]() |
To Train
This was written in python3 so may not work for python2. Download the COCO dataset training and validation
images. Put them in data/coco/imgs/train2014
and data/coco/imgs/val2014
respectively. Put the COCO
dataset split JSON file from Deep Visual-Semantic Alignments
in data/coco/
. It should be named dataset.json
.
Run the preprocessing to create the needed JSON files:
python generate_json_data.py
Start the training by running:
python train.py
The models will be saved in model/
and the training statistics will be saved in runs/
. To see the
training statistics, use:
tensorboard --logdir runs
To Generate Captions
python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>
Todo
- [x] Create image encoder class
- [x] Create decoder class
- [x] Create dataset loader
- [x] Write main function for training and validation
- [x] Implement attention model
- [x] Implement decoder feed forward function
- [x] Write training function
- [x] Write validation function
- [x] Add BLEU evaluation
- [ ] Update code to use GPU only when available, otherwise use CPU
- [x] Add performance statistics
- [x] Allow encoder to use resnet-152 and densenet-161
Captioned Examples
Correctly Captioned Images
Incorrectly Captioned Images
References
Original Theano Implementation
Neural Machine Translation By Jointly Learning to Align And Translate