seq2seq-summarizer icon indicating copy to clipboard operation
seq2seq-summarizer copied to clipboard

The computaion of neg_reward is wrong

Open zzxn opened this issue 4 years ago • 1 comments

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:

zzxn avatar Jul 26 '20 15:07 zzxn

Check https://github.com/ymfa/seq2seq-summarizer/issues/7 . The negative sign is included in the LogP so the author has reversed it in the reward.

saiprabhakar avatar Mar 18 '22 18:03 saiprabhakar