a-PyTorch-Tutorial-to-Image-Captioning icon indicating copy to clipboard operation
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

Results 119 a-PyTorch-Tutorial-to-Image-Captioning issues
Sort by recently updated
recently updated
newest added

i tried to generate caption with caption.py. but, i met this warning and script was stopped. (img_cap_py3) volquelme@ubuntu:~/show_attend_and_tell_pytorch$ python caption.py --img='test_image/test1.jpg' --model='BEST_checkpoint_flickr8k_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='Flickr8k_output/WORDMAP_flickr8k_5_cap_per_img_5_min_word_freq.json' --beam_size=5 /home/volquelme/anaconda3/envs/img_cap_py3/lib/python3.6/site-packages/skimage/transform/_warps.py:24: UserWarning: The default multichannel argument...

Hi @sgrvinod, thanks for your code. When using beam search, how de we perform ensemble testing (testing multiple models and average predictions across models). Should we add all the log...

When I run the create_input_files.py, it saved 7 json files and 3 hdf5 files. When I read, I understand the json files but I don't know what are the values...

Hi, not a bug per se, but I couldn't train on windows 10 at first. Had to set dataloader workers to 0. And for ``` scores, *_ = pack_padded_sequence(scores, decode_lengths,...

Hi @sgrvinod , in the caption.py, line 97, `scores = top_k_scores.expand_as(scores) + scores # (s, vocab_size)` I wonder why it's added? Shouldn't it be multiplied?

Hi, I am wondering why you use the sum(decode_lengths), which to me, means the total number of tokens in the batch, as the counts to update the loss metrics? Isn't...

Thank you very much for your useful tutorial. So, could you kindly offer a a tutorial for video captioning with soft attention? For example, this paper, Describing videos by exploiting...

Hi, I wanted to switch different pretrained CNN models and see how they effect the final results. So i switched resnet101 to resnet50 in models.py and ran train.py. The model...

https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/b0467042e3fec1ef72c323ccb41fb174a4f1ea52/train.py#L64 Why do you use two optimizers here? It seems other people only use one optimizer, which accepts both the encoder and decoder's params https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/image_captioning/train.py#L45 Thanks

After training, when I use the model to generate captions. It starts giving me the below error: `File "caption.py", line 215, in seq, alphas = caption_image_beam_search(encoder, decoder, args.img, word_map, args.beam_size)...