Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

RecurrentPPO: 9x speedup - whole sequence batching

Hello, I tried but couldn't test the PR, I got an error (before my changes) both with Pendulum and BipedalWalker: ``` Traceback (most recent call last): File "sb3_contrib/whole_sequence_speed_test.py", line 167,...

RecurrentPPO: 9x speedup - whole sequence batching

I had to set `drop_last=False` sometimes, otherwise I was getting error due to the fact nothing was sampled: `UnboundLocalError: local variable 'loss' referenced before assignment` To reproduce: ``` python -m...

RecurrentPPO: 9x speedup - whole sequence batching

Also an error when using CNN: ``` python train.py --algo ppo_lstm --env CarRacing-v2 -P --n-eval-envs 5 --eval-episodes 20 -params batch_size:8 whole_sequences:True ``` ``` self.train() File "/home/antonin/Documents/rl/sb3-contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 377, in train...

RecurrentPPO: 9x speedup - whole sequence batching

> On CartPole, I have another error: The error for CartPole seems to be still there...

Implementing "Sibling Rivalry" Method from "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards" Paper

Original issue: https://github.com/DLR-RM/stable-baselines3/issues/1802

Using Trained AutoEncoder for Mountain track

https://github.com/openai/gym/issues/3176#issuecomment-1560026649

[Question] Number of parallel environments with hyperparameters optimization

Because they are not created here, but there: https://github.com/DLR-RM/rl-baselines3-zoo/blob/e06914e9835b8f3233b18d59943b1464b89ddb90/rl_zoo3/exp_manager.py#L743 See comment just above: Why is the number of parallel environments (n_envs) set to one when using the optimize-hyperparameters? https://github.com/DLR-RM/rl-baselines3-zoo/blob/e06914e9835b8f3233b18d59943b1464b89ddb90/rl_zoo3/exp_manager.py#L197

PPORecurrent mini batch size inconsistent

Hello, > The batch size is now always higher than the specified 100, and different every time. my guess is that it is because of padding/masking. https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/a9735b9f317be4283e56d221e19087b926ca9ec0/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L369 `rollout_data.observations[mask].shape` should be...

PPORecurrent mini batch size inconsistent

> the purpose of padding in this case, as the data is fed sequentially into the model (seq_length * obs_size) What you want to feed is (batch_size, obs_shape) sequentially: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/a9735b9f317be4283e56d221e19087b926ca9ec0/sb3_contrib/common/recurrent/buffers.py#L231...

PPORecurrent mini batch size inconsistent

> with an LSTM was running at only 104 fps on a 24 core machine, against normal PPO with an MLP at around 1000 fps on the same machine. This...