HARL
HARL copied to clipboard
Bug: There isn't reset of the environment when training
I noticed that with on policy algorithms, the data collection process is done in the run
function in OnPolicyBaseRunner
. However, in my experiments, I noticed that my environment would not be reset even if it already gives out done == True
. Following this clue, I found out that there isn't a reset procudure in the run
function or any functions called by it that handles the problem.