Deeper_Larger_Actor-Critic_RL icon indicating copy to clipboard operation
Deeper_Larger_Actor-Critic_RL copied to clipboard

Pytorch implementation of large network design in continous control RL.

Deeper and Larger Network Design for Continous Control in RL

Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:

In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense. It is noteworthing that we only implement single-machine approach for ofe_dense, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.

Supported algorithms

algorithm continuous control on-policy / off-policy
Proximal Policy Optimization (PPO) coupled with d2rl :white_check_mark: on-policy
Deep Deterministic Policy Gradients (DDPG) coupled with d2rl :white_check_mark: off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe :white_check_mark: off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense :white_check_mark: off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl :white_check_mark: off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe :white_check_mark: off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense :white_check_mark: off-policy
Soft Actor-Critic (SAC) coupled with d2rl :white_check_mark: off-policy
Soft Actor-Critic (SAC) coupled with ofe :white_check_mark: off-policy
Soft Actor-Critic (SAC) coupled with ofe_dense :white_check_mark: off-policy

Instructions

Recommend: Run with Docker

# python        3.6    (apt)
# pytorch       1.4.0  (pip)
# tensorflow    1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-docker

For other dockerfiles, you can go to RL Dockefiles.

Launch experiments

Run with the scripts batch_run_main_d2rl_4seed_cuda.sh / batch_run_main_ofe_4seed_cuda.sh / batch_run_main_ofe_dense_4seed_cuda.sh / batch_run_ppo_d2rl_4seed_cuda.sh:

# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True

bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0

Plot results

# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset, 
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
    data/DDPG_ofe-Hopper-v2/ \
    -l DDPG_ofe -s 10

Performance on MuJoCo

Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Walker2d-v2.

  • DDPG and its variants

  • TD3 and its variants

  • SAC and its variants

  • PPO and its variants

Citation

@misc{QingLi2021larger,
  author = {Qing Li},
  title = {Deeper and Larger Network Design for Continous Control in RL},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}