a-PyTorch-Tutorial-to-Image-Captioning icon indicating copy to clipboard operation
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard

Beam Search - normalizing the scores by length ?

Open solenetarride opened this issue 4 years ago • 1 comments

Thank you so much for this tutorial.

I have a question about beam search decoding.

I saw in this course https://youtu.be/XXtpJxZBa2c?t=2407 that the scores should be normalized by length at the end of Beam Search. The reasoning is that longer hypothesis have lower scores, since a negative score is added for each token's prediction.

Is this normalization done somewhere ? I couldn't find it.

Thank you so much, Solène

solenetarride avatar Dec 05 '20 16:12 solenetarride

Hello @solenetarride, after reviewing the implementation, I also couldn't find evidence of length normalization being employed. However, it seems that length normalization might be unnecessary in this particular context. The algorithm's design, as it stands, resets top_k_scores at every step, and complete_seqs_scores is derived from one or more of the current scores. This approach indicates that the algorithm does not accumulate scores over multiple steps but instead bases its decisions on the most recent word scores. This methodology is evident in the code snippet below, especially in how top_k_scores and top_k_words are determined. This process suggests a focus on immediate word scores rather than an aggregate score over the length of the sequence, which would be the focus of a length normalization approach. (let me know if this does not make as it is my interpretation of the approach)

 # For the first step, all k points will have the same scores (since same k previous words, h, c)
  if step == 1:
      top_k_scores, top_k_words = scores[0].topk(k, 0, True, True)  # (s)
  else:
      # Unroll and find top scores, and their unrolled indices
      top_k_scores, top_k_words = scores.view(-1).topk(k, 0, True, True)  # (s)

gsoykan avatar Jan 03 '24 16:01 gsoykan