rlkit Multiple worker support

Is there support for multiple workers (threads each feeding in independent samples)?

I'm trying to reproduce the pick and place here, and they mention needing 19 workers to do so: https://github.com/openai/baselines/tree/master/baselines/her

On the other hand, seems like since HER is off policy multiple workers aren't necessary..

Jan 17 '19 05:01 richardrl

No. However, if you're interested in implementing it, I would start with the batch training mode and just replace the "_take_step_in_env" call with some parallelized implementation.

Jan 17 '19 20:01 vitchyr

I'm not planning on adding this any time soon, though I've been talking with some people at Berkeley who might be interested in doing this, either as part of this repo or separately. PRs would also be welcomed, and if you're interested in this let me know.

Jan 17 '19 20:01 vitchyr

I'm going to re-open this and label it as a enhancement request. Though like I said, I probably won't get around to it any time soon.

Jan 19 '19 02:01 vitchyr

Is there any progress on this?

Apr 08 '20 22:04 bycn

@bycn You can check how I implemented it here: https://github.com/richardrl/rlkit-relational/blob/master/rlkit/torch/optim/mpi_adam.py

Apr 09 '20 16:04 richardrl

Thanks for the reply—actually, I ended up realizing I'm looking for parallelized sampling, not parallelized consumption.

Apr 09 '20 19:04 bycn

@bycn Parallel sampling is also implemented. Check how "num_parallel_processes" is used here: https://github.com/richardrl/rlkit-relational/blob/master/examples/relationalrl/train_pickandplace1.py

Apr 09 '20 23:04 richardrl

It's unclear to me how parallel sampling is implemented here without vectorized environments, i.e., like https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html. Can you explain the approach?

Apr 09 '20 23:04 bycn

I don't know what the vectorized environments are doing, but I am simply starting a Mujoco instance in multiple threads with a local replay buffer in each thread, sampling each environment to fill the local replay buffer, computing the gradient, and then taking the average of the gradient in the optimizer file you see above.

Apr 12 '20 08:04 richardrl

If you have further questions about my implementation, feel free to open an issue in that repo.

Apr 12 '20 08:04 richardrl

rlkit rlkit copied to clipboard

Multiple worker support

rlkit
rlkit copied to clipboard