RL-Adventure icon indicating copy to clipboard operation
RL-Adventure copied to clipboard

Error in Priority Update for Prioritized Replay

Open qfettes opened this issue 7 years ago • 1 comments

It looks like you're updating the priorities in the replay buffer according to the weighted and squared TD error.

loss  = (q_value - expected_q_value.detach()).pow(2) * weights
prios = loss + 1e-5
replay_buffer.update_priorities(indices, prios.data.cpu().numpy())

However, the algorithm in the original paper updates the priority only according to the absolute value of the TD error, which is not weighted. I believe this is a mistake in your implementation

qfettes avatar Jun 03 '18 06:06 qfettes

Can confirm: image Transition priorities are updated with the magnitude of the TD error (lines 11-12). Paper for reference

mneira10 avatar Sep 26 '19 22:09 mneira10