beam search ranked by ppl instead of probability
The code is trained on minimizing log_perplexity, but the beam search is ranked by log_probs. A simple test on my side shows ranking by log_perplexity in beam search could give higher bleu, rouge, and meteor scores, which is more consistent with the optimization function.
doh! Sorry, can you elaborate? I was under the impression that logprobs was already normalized due to use of a LogSoftMax layer, so this should already be a correctly-normalized log-perplexity? What change did you include, exactly?
Hi, the nn.LanguageModelCriterion is optimized by minimizing -logprobs/#total_number_wds within a batch. I would consider it as log_ppl. However, during beam search, we are choosing top K beams with highest logprobs: L169: local function compare(a,b) return a.p > b.p end -- used downstream I think considering ppl and "return a.ppl < b.ppl" is more reasonable. How do you think?
When I create the things I end up sorting, I create them on L218 as
table.insert(candidates, {c=ix[{ q,c }], q=q, p=candidate_logprob, r=local_logprob })
so in fact the .p field holds the logprob, which I end up sorting by. I'm not using the raw probabilities. And there is no .ppl field here.
You are absolutely right here! But I think the ranking of done_beams need to consider logppls.
What I did is add one more function called "compare_ppl" and I will calculate ppl for each done_beam, so that
will rank done_beams by ascending logppls, instead of descending logprobs. How do you think?
@lichengunc I think you're right. I also achieve higher performance after sorting with ppl.