Wenjun Li

Results 2 comments of Wenjun Li

> The exact answer depends on the algorithm you use, but at least with DQN the code re-creates the replay buffer on every call to `learn`. > > However in...

> > Thanks for your swift response. I am using TRPO and PPO. So, you mean stable-baselines3 would be more suitable for this problem (because stable-baselines3 will collect previous samples...