rlkit icon indicating copy to clipboard operation
rlkit copied to clipboard

Multiple worker support

Open richardrl opened this issue 6 years ago • 10 comments

Is there support for multiple workers (threads each feeding in independent samples)?

I'm trying to reproduce the pick and place here, and they mention needing 19 workers to do so: https://github.com/openai/baselines/tree/master/baselines/her

On the other hand, seems like since HER is off policy multiple workers aren't necessary..

richardrl avatar Jan 17 '19 05:01 richardrl

No. However, if you're interested in implementing it, I would start with the batch training mode and just replace the "_take_step_in_env" call with some parallelized implementation.

vitchyr avatar Jan 17 '19 20:01 vitchyr

I'm not planning on adding this any time soon, though I've been talking with some people at Berkeley who might be interested in doing this, either as part of this repo or separately. PRs would also be welcomed, and if you're interested in this let me know.

vitchyr avatar Jan 17 '19 20:01 vitchyr

I'm going to re-open this and label it as a enhancement request. Though like I said, I probably won't get around to it any time soon.

vitchyr avatar Jan 19 '19 02:01 vitchyr

Is there any progress on this?

bycn avatar Apr 08 '20 22:04 bycn

@bycn You can check how I implemented it here: https://github.com/richardrl/rlkit-relational/blob/master/rlkit/torch/optim/mpi_adam.py

richardrl avatar Apr 09 '20 16:04 richardrl

Thanks for the reply—actually, I ended up realizing I'm looking for parallelized sampling, not parallelized consumption.

bycn avatar Apr 09 '20 19:04 bycn

@bycn Parallel sampling is also implemented. Check how "num_parallel_processes" is used here: https://github.com/richardrl/rlkit-relational/blob/master/examples/relationalrl/train_pickandplace1.py

richardrl avatar Apr 09 '20 23:04 richardrl

It's unclear to me how parallel sampling is implemented here without vectorized environments, i.e., like https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html. Can you explain the approach?

bycn avatar Apr 09 '20 23:04 bycn

I don't know what the vectorized environments are doing, but I am simply starting a Mujoco instance in multiple threads with a local replay buffer in each thread, sampling each environment to fill the local replay buffer, computing the gradient, and then taking the average of the gradient in the optimizer file you see above.

richardrl avatar Apr 12 '20 08:04 richardrl

If you have further questions about my implementation, feel free to open an issue in that repo.

richardrl avatar Apr 12 '20 08:04 richardrl