seq2seq-summarizer
seq2seq-summarizer copied to clipboard
The computaion of neg_reward is wrong
This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:
Check https://github.com/ymfa/seq2seq-summarizer/issues/7 . The negative sign is included in the LogP so the author has reversed it in the reward.