Deeper_Larger_Actor-Critic_RL
Deeper_Larger_Actor-Critic_RL copied to clipboard
Pytorch implementation of large network design in continous control RL.
Deeper and Larger Network Design for Continous Control in RL
Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:
- 2020 ICML Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
- 2020 NeurIPS Workshop D2RL: Deep Dense Architectures in Reinforcement Learning
- 2021 Arxiv Training Larger Networks for Deep Reinforcement Learning
In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe
, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl
, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense
. It is noteworthing that we only implement single-machine approach for ofe_dense
, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.
Supported algorithms
algorithm | continuous control | on-policy / off-policy |
---|---|---|
Proximal Policy Optimization (PPO) coupled with d2rl | :white_check_mark: | on-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with d2rl | :white_check_mark: | off-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with ofe | :white_check_mark: | off-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense | :white_check_mark: | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl | :white_check_mark: | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe | :white_check_mark: | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense | :white_check_mark: | off-policy |
Soft Actor-Critic (SAC) coupled with d2rl | :white_check_mark: | off-policy |
Soft Actor-Critic (SAC) coupled with ofe | :white_check_mark: | off-policy |
Soft Actor-Critic (SAC) coupled with ofe_dense | :white_check_mark: | off-policy |
Instructions
Recommend: Run with Docker
# python 3.6 (apt)
# pytorch 1.4.0 (pip)
# tensorflow 1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-docker
For other dockerfiles, you can go to RL Dockefiles.
Launch experiments
Run with the scripts batch_run_main_d2rl_4seed_cuda.sh
/ batch_run_main_ofe_4seed_cuda.sh
/ batch_run_main_ofe_dense_4seed_cuda.sh
/ batch_run_ppo_d2rl_4seed_cuda.sh
:
# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True
bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0
Plot results
# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset,
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
data/DDPG_ofe-Hopper-v2/ \
-l DDPG_ofe -s 10
Performance on MuJoCo
Including Ant-v2
, HalfCheetah-v2
, Hopper-v2
, Humanoid-v2
, Walker2d-v2
.
-
DDPG and its variants
-
TD3 and its variants
-
SAC and its variants
-
PPO and its variants
Citation
@misc{QingLi2021larger,
author = {Qing Li},
title = {Deeper and Larger Network Design for Continous Control in RL},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}