mujoco-benchmark
                                
                                 mujoco-benchmark copied to clipboard
                                
                                    mujoco-benchmark copied to clipboard
                            
                            
                            
                        Provide full reinforcement learning benchmark on mujoco environments, including ddpg, sac, td3, pg, a2c, ppo, library
This repo only servers as a link to Tianshou's benchmark of Mujoco environments. Latest benchmark is maintained under thu-ml/tianshou. See full benchmark here.
Keywords: deep reinforcement learning, pytorch, mujoco, benchmark, performances, Tianshou, baseline
Tianshou's Mujoco Benchmark
We benchmarked Tianshou algorithm implementations in 9 out of 13 environments from the MuJoCo Gym task suite.
For each supported algorithm and supported mujoco environments, we provide:
- Default hyperparameters used for benchmark and scripts to reproduce the benchmark;
- A comparison of performance (or code level details) with other open source implementations or classic papers;
- Graphs and raw data that can be used for research purposes;
- Log details obtained during training;
- Pretrained agents;
- Some hints on how to tune the algorithm.
Supported algorithms are listed below:
- Deep Deterministic Policy Gradient (DDPG), commit id
- Twin Delayed DDPG (TD3), commit id
- Soft Actor-Critic (SAC), commit id
- REINFORCE algorithm, commit id
- Natural Policy Gradient (NPG), commit id
- Advantage Actor-Critic (A2C), commit id
- Proximal Policy Optimization (PPO), commit id
- Trust Region Policy Optimization (TRPO), commit id
- Trust Region Policy Optimization (ACKTR), commit id
Example benchmark
SAC
| Environment | Tianshou | SpinningUp (Pytorch) | SAC paper | 
|---|---|---|---|
| Ant | 5850.2±475.7 | ~3980 | ~3720 | 
| HalfCheetah | 12138.8±1049.3 | ~11520 | ~10400 | 
| Hopper | 3542.2±51.5 | ~3150 | ~3370 | 
| Walker2d | 5007.0±251.5 | ~4250 | ~3740 | 
| Swimmer | 44.4±0.5 | ~41.7 | N | 
| Humanoid | 5488.5±81.2 | N | ~5200 | 
| Reacher | -2.6±0.2 | N | N | 
| InvertedPendulum | 1000.0±0.0 | N | N | 
| InvertedDoublePendulum | 9359.5±0.4 | N | N | 
 
Other resources
- Spinningup Benchmark
- OpenAI Baseliens Benchmark
- TODO and relative discussions: 1, 2
