oprl icon indicating copy to clipboard operation
oprl copied to clipboard

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing

D4PG-pytorch

PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).

d4pg_arch

Implementation was tested on environments from OpenAI Gym.

About

D4PG and D3PG implementations with following features

  • learner, sampler and agents run in separate processes
  • exploiter agent(s) exists which acts without noise in actions on target network
  • GPU is hold only by exploiters, all other exploration processes are run on CPU

Project was tested on Ubuntu 18.04, Intel i5 with 4 cores, Nvidia GTX 1080Ti

Usage

Run python train.py --config configs/openai/d4pg/walker2d_d4pg.yml

Tests

python -m unittest discover

Results

Configs for reproducing curves below can be found in configs directory (num parallel agents = 4).

OpenAI Mujoco

d4pg_results2

DMControl

dmc_d4pg

Reproduce

All results were obtained with configs in configs directory

References

  • Continuous control with deep reinforcement learning, [https://arxiv.org/abs/1509.02971]
  • Distributed Distributional Deterministic Policy Gradients [https://arxiv.org/abs/1804.08617]