oprl
oprl copied to clipboard
A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing
D4PG-pytorch
PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).
About
D4PG and D3PG implementations with following features
- learner, sampler and agents run in separate processes
- exploiter agent(s) exists which acts without noise in actions on target network
- GPU is hold only by exploiters, all other exploration processes are run on CPU
Project was tested on Ubuntu 18.04, Intel i5 with 4 cores, Nvidia GTX 1080Ti
Usage
Run python train.py --config configs/openai/d4pg/walker2d_d4pg.yml
Tests
python -m unittest discover
Results
Configs for reproducing curves below can be found in configs
directory (num parallel agents = 4).
OpenAI Mujoco
DMControl
Reproduce
All results were obtained with configs in configs
directory
References
- Continuous control with deep reinforcement learning, [https://arxiv.org/abs/1509.02971]
- Distributed Distributional Deterministic Policy Gradients [https://arxiv.org/abs/1804.08617]