a-PyTorch-Tutorial-to-Image-Captioning
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard
Why do you use sum(decode_lengths) to as the count to update losses and topkaccuracies?
Hi, I am wondering why you use the sum(decode_lengths), which to me, means the total number of tokens in the batch, as the counts to update the loss metrics?
Isn't the batch size be more appropriate?
Thanks a lot
Hi, I did this because the loss is computed for each token and then averaged by the total number of tokens in the batch.
Both of these tensors have sum(decode_lengths) rows. Therefore, in this line, the loss computed has been averaged by sum(decode_lengths).
Similarly, the accuracy too is calculated for sum(decode_lengths) tokens.
Hi @sgrvinod , regarding https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/b3b82636f7decb261865b92854a4741652dc413a/train.py#L179, does the pack_padded_sequence(), does the pytorch version 0.4.1 returns only one PackedSequence object? I am seeing an error on this line, have you experienced anything similar? Thanks
Traceback (most recent call last):
File "train.py", line 127, in train
scores, _ = pack_padded_sequence(scores, decode_lengths, batch_first=True)
ValueError: too many values to unpack (expected 2)
For a fix see issue #75