RL-Adventure
RL-Adventure copied to clipboard
Error in Priority Update for Prioritized Replay
It looks like you're updating the priorities in the replay buffer according to the weighted and squared TD error.
loss = (q_value - expected_q_value.detach()).pow(2) * weights
prios = loss + 1e-5
replay_buffer.update_priorities(indices, prios.data.cpu().numpy())
However, the algorithm in the original paper updates the priority only according to the absolute value of the TD error, which is not weighted. I believe this is a mistake in your implementation
Can confirm:
Transition priorities are updated with the magnitude of the TD error (lines 11-12).
Paper for reference