a-PyTorch-Tutorial-to-Image-Captioning icon indicating copy to clipboard operation
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard

Why do you use sum(decode_lengths) to as the count to update losses and topkaccuracies?

Open dwang68 opened this issue 6 years ago • 3 comments

Hi, I am wondering why you use the sum(decode_lengths), which to me, means the total number of tokens in the batch, as the counts to update the loss metrics?

Isn't the batch size be more appropriate?

Thanks a lot

dwang68 avatar Mar 29 '19 05:03 dwang68

Hi, I did this because the loss is computed for each token and then averaged by the total number of tokens in the batch.

Both of these tensors have sum(decode_lengths) rows. Therefore, in this line, the loss computed has been averaged by sum(decode_lengths).

Similarly, the accuracy too is calculated for sum(decode_lengths) tokens.

sgrvinod avatar Mar 30 '19 03:03 sgrvinod

Hi @sgrvinod , regarding https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/b3b82636f7decb261865b92854a4741652dc413a/train.py#L179, does the pack_padded_sequence(), does the pytorch version 0.4.1 returns only one PackedSequence object? I am seeing an error on this line, have you experienced anything similar? Thanks

Traceback (most recent call last): File "train.py", line 127, in train scores, _ = pack_padded_sequence(scores, decode_lengths, batch_first=True) ValueError: too many values to unpack (expected 2)

xeniaqian94 avatar Jun 19 '19 18:06 xeniaqian94

For a fix see issue #75

kmario23 avatar Jun 19 '19 21:06 kmario23