Atari icon indicating copy to clipboard operation
Atari copied to clipboard

Finish prioritised experience replay

Open Kaixhin opened this issue 9 years ago • 2 comments

Rank-based prioritised experience replay appears to be working, but technically needs some changes. Instead of storing terminal states with a priority of 0, they should not be stored at all. This requires more checks, as the elements in the experience replay memory and the elements in the priority queue will differ.

Secondly, proportional prioritised experience replay still needs to be implemented. See here and here for an implementation of the sum binary tree.

For reference, below are results from a working implementation of rank-based PER on Frostbite: scores

Kaixhin avatar Jun 16 '16 14:06 Kaixhin

maybe we can store experience as a tuple like (s_t, a, r, s_t_1, t), terminal state will not be store in experience replay if use this pattern. usually t is 0, and t == 1 would generate tuple (s, a, r, TERMINAL_STATE, 1)

Damcy avatar Jul 25 '16 08:07 Damcy

Note: It might be worth subclassing the Heap from torchlib for the priority queue.

Kaixhin avatar Aug 16 '16 10:08 Kaixhin