The design of rewarder

Open qingyue2014 opened this issue 6 years ago • 0 comments

In your paper, the rewarder network is modeled a simple feed-forward neural network. When I try to understand it thought this code, I found that it was modeled a LSTM. The value of reward comes from the prediction of LSTM network each time. Why ?

Aug 26 '19 12:08 qingyue2014