acme icon indicating copy to clipboard operation
acme copied to clipboard

Wrong recurrent state accessed in R2D2 Learner

Open ostap-viniavskyi opened this issue 3 years ago • 0 comments

In R2D2 Learner you sample learning trajectories from Reverb in such a format that at some index t you have observation x_t, action a_t, reward r_t, and recurrent state of LSTM network state_t that the network had after processing the observation x_t. Doesn't this mean that when you apply unroll in the learner (link to code) you use LSTM state from one step in the future, effectively leaking information from the future state?

ostap-viniavskyi avatar Aug 16 '22 07:08 ostap-viniavskyi