QDax
QDax copied to clipboard
PGAME add dead transition to Replay Buffer
Hi :)
It seems that the current implementation of PGAME is adding to the Replay-Buffer all the transitions that are collected in the environment, meaning also transitions that occur after the individual is dead and the environment returned done = 1
.
I only run initial tests but this seems to slightly impact the PGAME algorithm performance.
Hey :wave:
This is completely true!
One way to avoid this is to set auto_reset=True
in the Brax environment instantiation. Hence, new episodes will be played and added to the buffer. This data will be "good" data and will hence have no impact on the training process.
This sounds like a better way to handle it than removing dead individuals from the buffer, because it will avoid a loss of data efficiency (dead transitions are still played and counted in the total number of steps).
Would this solution fit you?
A caveat will be added to the documentation to make sure users are aware of this dangerous behavior.