Antonin RAFFIN
Antonin RAFFIN
> At this point, wouldn't it be clearer to put the code into common/buffers.py? yes probably, but the most important thing for now is to test the implementation (performance test,...
> performance test, check we can reproduce the results from the paper After some initial test on Breakout following hyperparameters from the paper, the run didn't improve or worsen DQN...
Some update from my part, I just added CNN support for SBX (SB3 + Jax) DQN, and it is 10x faster than the PyTorch equivalent: https://github.com/araffin/sbx/pull/49 That should allow to...
Some additional update: when trying to plug the PER implementation of this PR inside the Jax DQN implementation, the experience replay was the bottleneck (by a good margin, making things...
> Does SBX/Jax means this much speed improvement? With the right parameters (see the exact command line argument for the RL Zoo in the OpenRL benchmark organization run on W&B),...
> Add next_observations and dones fields to the RolloutBuffer and the DictRolloutBuffer classes, similar to how it is done in the ReplayBuffer class. dones are stored in `episode_starts` (shifted by...
Hello, > I suggest we add a log with verbose=2 that describe if preprocess_obs normalized any of the input for the network. where exactly do you want to print additional...
> I would suggest doing it at the beginning of the training the same way we display this kind of log: I see. Will be a bit harder in that...
See https://github.com/DLR-RM/stable-baselines3/issues/622
> or you're still waiting for contributions? We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.