neuraltalk2 beam search ranked by ppl instead of probability

The code is trained on minimizing log_perplexity, but the beam search is ranked by log_probs. A simple test on my side shows ranking by log_perplexity in beam search could give higher bleu, rouge, and meteor scores, which is more consistent with the optimization function.

Apr 05 '16 19:04 lichengunc

doh! Sorry, can you elaborate? I was under the impression that logprobs was already normalized due to use of a LogSoftMax layer, so this should already be a correctly-normalized log-perplexity? What change did you include, exactly?

Apr 14 '16 00:04 karpathy

Hi, the nn.LanguageModelCriterion is optimized by minimizing -logprobs/#total_number_wds within a batch. I would consider it as log_ppl. However, during beam search, we are choosing top K beams with highest logprobs: L169: local function compare(a,b) return a.p > b.p end -- used downstream I think considering ppl and "return a.ppl < b.ppl" is more reasonable. How do you think?

Apr 18 '16 01:04 lichengunc

When I create the things I end up sorting, I create them on L218 as

table.insert(candidates, {c=ix[{ q,c }], q=q, p=candidate_logprob, r=local_logprob })

so in fact the .p field holds the logprob, which I end up sorting by. I'm not using the raw probabilities. And there is no .ppl field here.

Apr 18 '16 01:04 karpathy

You are absolutely right here! But I think the ranking of done_beams need to consider logppls. What I did is add one more function called "compare_ppl" and I will calculate ppl for each done_beam, so that screen shot 2016-04-17 at 10 16 06 pm will rank done_beams by ascending logppls, instead of descending logprobs. How do you think?

Apr 18 '16 02:04 lichengunc

@lichengunc I think you're right. I also achieve higher performance after sorting with ppl.

Oct 21 '16 15:10 szq0214