a-PyTorch-Tutorial-to-Image-Captioning
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
i tried to generate caption with caption.py. but, i met this warning and script was stopped. (img_cap_py3) volquelme@ubuntu:~/show_attend_and_tell_pytorch$ python caption.py --img='test_image/test1.jpg' --model='BEST_checkpoint_flickr8k_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='Flickr8k_output/WORDMAP_flickr8k_5_cap_per_img_5_min_word_freq.json' --beam_size=5 /home/volquelme/anaconda3/envs/img_cap_py3/lib/python3.6/site-packages/skimage/transform/_warps.py:24: UserWarning: The default multichannel argument...
Hi @sgrvinod, thanks for your code. When using beam search, how de we perform ensemble testing (testing multiple models and average predictions across models). Should we add all the log...
When I run the create_input_files.py, it saved 7 json files and 3 hdf5 files. When I read, I understand the json files but I don't know what are the values...
Hi, not a bug per se, but I couldn't train on windows 10 at first. Had to set dataloader workers to 0. And for ``` scores, *_ = pack_padded_sequence(scores, decode_lengths,...
Hi @sgrvinod , in the caption.py, line 97, `scores = top_k_scores.expand_as(scores) + scores # (s, vocab_size)` I wonder why it's added? Shouldn't it be multiplied?
Hi, I am wondering why you use the sum(decode_lengths), which to me, means the total number of tokens in the batch, as the counts to update the loss metrics? Isn't...
Thank you very much for your useful tutorial. So, could you kindly offer a a tutorial for video captioning with soft attention? For example, this paper, Describing videos by exploiting...
Hi, I wanted to switch different pretrained CNN models and see how they effect the final results. So i switched resnet101 to resnet50 in models.py and ran train.py. The model...
https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/b0467042e3fec1ef72c323ccb41fb174a4f1ea52/train.py#L64 Why do you use two optimizers here? It seems other people only use one optimizer, which accepts both the encoder and decoder's params https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/image_captioning/train.py#L45 Thanks
After training, when I use the model to generate captions. It starts giving me the below error: `File "caption.py", line 215, in seq, alphas = caption_image_beam_search(encoder, decoder, args.img, word_map, args.beam_size)...