a-PyTorch-Tutorial-to-Image-Captioning issues

what is the lowest loss we can achieve with this architecture?

3

I tried to overfitting the model but it seems so hard to achieve I've minimize the data into 100 image with 1 captions per image and the lowest loss I...

laptopmutia

cpu 1700%

3

I run the code on linux server using gpu. But cpu occupies too high.Is there anything wrong?

kangkang59812

Why do you divide the image tensor by 255.

2

In the CaptionDataset's __get_item__() method, you devided the image tensor by 255. Is it for regularization or for something else? img = torch.FloatTensor(self.imgs[i // self.cpi] / 255.) Thanks!

dwang68

The question about the vocabulary size.

7

hello, thanks for your nice code. Does the length of the vocabulary affect the final result? because,The length of the vocabulary in other people's work is different from yours on...

zkself

Why do you use log_softmax in sampling?

1

In line 94 in caption.py you use: `scores = F.log_softmax(scores, dim=1)` Could you explain the reason for `log_softmax` here? You did not use it in `forward()` method. More than that,...

RodinIvan

using original caption dataset for flickr8k & flickr30k

Hey, I'm just wondering. Can I use the original caption from Flickr8k (Flickr8k.token.txt) and Flickr30k (results_20130124.token), instead of using caption from karpathy's split using this code? Thank you very much

ajengwulandari

predict linear layer's input is just hidden states but in original paper, they combined with [L(word_embed+Wh+Uc)]

3

Hi, Thx for your great tutorial with nice guide and code. After I read decoder's code, I found that you just use lstm's hidden states to compute the next word's...

joelxiangnanchen

Loss function in Stochastic "Hard" Attention

1

Great tutorial, thanks! In the case of "Hard" attention, you mentioned in your tutorial that "it is not differentiable" so maybe this is why a new objective function `Ls` is...

kouui

Asking about List of Packages Version

I try to run the code there are some errors when to create the input files. I want to know list of requirement packages version to run the code.

supriamir

I think this a bug. caption.py 140

5

incomplete_inds = [ind for ind, next_word in enumerate(next_word_inds) if next_word != word_map['']] incomplete_inds always is [0,1,2,3,4] . and then complete_inds = list(set(range(len(next_word_inds))) - set(incomplete_inds)) complete_inds is empty so complete_seqs is...

tigerbrother222

a-PyTorch-Tutorial-to-Image-Captioning
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard

Metadata

what is the lowest loss we can achieve with this architecture?

cpu 1700%

Why do you divide the image tensor by 255.

The question about the vocabulary size.

Why do you use log_softmax in sampling?

using original caption dataset for flickr8k & flickr30k

predict linear layer's input is just hidden states but in original paper, they combined with [L(word_embed+Wh+Uc)]

Loss function in Stochastic "Hard" Attention

Asking about List of Packages Version

I think this a bug. caption.py 140

← Metadata

Owner

Metadata

a-PyTorch-Tutorial-to-Image-Captioning a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard

Metadata

← Metadata

Owner

Metadata

a-PyTorch-Tutorial-to-Image-Captioning
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard