QDax icon indicating copy to clipboard operation
QDax copied to clipboard

PGAME add dead transition to Replay Buffer

Open manon-but-yes opened this issue 2 years ago • 2 comments

Hi :)

It seems that the current implementation of PGAME is adding to the Replay-Buffer all the transitions that are collected in the environment, meaning also transitions that occur after the individual is dead and the environment returned done = 1. I only run initial tests but this seems to slightly impact the PGAME algorithm performance.

manon-but-yes avatar Jul 30 '22 14:07 manon-but-yes

Hey :wave:

This is completely true!

One way to avoid this is to set auto_reset=True in the Brax environment instantiation. Hence, new episodes will be played and added to the buffer. This data will be "good" data and will hence have no impact on the training process.

This sounds like a better way to handle it than removing dead individuals from the buffer, because it will avoid a loss of data efficiency (dead transitions are still played and counted in the total number of steps).

Would this solution fit you?

felixchalumeau avatar Aug 19 '22 12:08 felixchalumeau

A caveat will be added to the documentation to make sure users are aware of this dangerous behavior.

felixchalumeau avatar Sep 06 '22 08:09 felixchalumeau