seq2seq-summarizer The computaion of neg

The computaion of neg_reward is wrong

Open zzxn opened this issue 4 years ago • 1 comments

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this: $J(\theta)={1\over{N}}\Sigma_{i=1}^N (r(y^i)-b_{y^i})\Sigma_{t=1}^T\log {p_{\theta}(y_t^i|y_{1...t}^i,x^i)}$

Jul 26 '20 15:07 zzxn

Check https://github.com/ymfa/seq2seq-summarizer/issues/7 . The negative sign is included in the LogP so the author has reversed it in the reward.

Mar 18 '22 18:03 saiprabhakar

seq2seq-summarizer seq2seq-summarizer copied to clipboard

The computaion of neg_reward is wrong

seq2seq-summarizer
seq2seq-summarizer copied to clipboard