stable-baselines icon indicating copy to clipboard operation
stable-baselines copied to clipboard

Parallel rollout implementation in HER+DDPG?

Open RyanRizzo96 opened this issue 6 years ago • 2 comments

In ddpg.py, the parameter nb_rollout_steps is an integer containing the number of rollout steps. I believe that this is the same as the parameter T in OpenAI baselines which refers to "the time horizon for rollouts" as they put it.

My question is, where is the number of parallel rollouts per DDPG agents implemented in stable baselines? In OpenAI Baselines this value is passed when initializing DDPG as rollout_batch_size.

Any suggestions would be appreciated.

RyanRizzo96 avatar Nov 18 '19 21:11 RyanRizzo96

Hello,

It seems you are talking about the custom DDPG implementation of OpenAI they created for HER. To be honest, this one is quite confusing, has a lot of tricks, that's also why we rewrote HER completely.

the number of parallel rollouts per DDPG agents implemented in stable baselines?

If you explain me what it is then I can maybe give you the equivalent. I don't really get what number of parallel rollouts mean. Is it a number of episodes, is it a number of parallel agents?

Note that the DDPG implementation in stable-baselines is the one from the original baselines (but not the custom made for HER).

araffin avatar Nov 19 '19 14:11 araffin

Hi,

Yes I am talking about the custom DDPG implementation. In Plappert et al. (2018), 38 trajectories were generated in parallel (19 MPI processes, each generating computing gradients from 2 trajectories and aggregating).

Their code comment states:

https://github.com/openai/baselines/blob/9ee399f5b20cd70ac0a871927a6cf043b478193f/baselines/her/ddpg.py#L50

I think that this refers to the set of trajectories simulated in parallel. Maybe the below image will help show what I mean.

image

RyanRizzo96 avatar Nov 19 '19 15:11 RyanRizzo96