Continuous-PPO
Continuous-PPO copied to clipboard
Proximal Policy Optimization (Continuous Version) in PyTorch.
Continuous-PPO
Implementation of the proximal policy optimization on Mujoco environments. All hyper-parameters have been chosen based on the paper.
For Atari domain. look at this.
Demos
Ant-v2 | Walker2d-v2 | InvertedDoublePendulum-v2 |
---|---|---|
![]() |
![]() |
![]() |
Results
Ant-v2 | Walker2d-v2 | InvertedDoublePendulum-v2 |
---|---|---|
![]() |
![]() |
![]() |
Dependencies
- gym == 0.17.2
- mujoco-py == 2.0.2.13
- numpy == 1.19.1
- opencv_contrib_python == 3.4.0.12
- torch == 1.4.0
Installation
pip3 install -r requirements.txt
Usage
python3 main.py
- You may use
Train_FLAG
flag to specify whether to train your agent when it isTrue
or test it when the flag isFalse
. - There are some pre-trained weights in pre-trained models dir, you can test the agent by using them; put them on the root folder of the project and turn
Train_FLAG
flag toFalse
.
Environments tested
- [x] Ant
- [x] InvertedDoublePendulum
- [x] Walker2d
- [ ] Hopper
- [ ] Humanoid
- [ ] Swimmer
- [ ] HalfCheetah
Reference
Proximal Policy Optimization Algorithms, Schulman et al., 2017
Acknowledgement
- @higgsfield for his ppo code.
- @OpenAI for Baselines.
- @Reinforcement Learning KR for their Policy Gradient Algorithms.