Irl_gen
Irl_gen copied to clipboard
The design of rewarder
In your paper, the rewarder network is modeled a simple feed-forward neural network. When I try to understand it thought this code, I found that it was modeled a LSTM. The value of reward comes from the prediction of LSTM network each time. Why ?