mario_rl
mario_rl copied to clipboard
Super Mario Bros RL
-
[x] Advantage Actor critic [1]
-
[x] Parallel Advantage Actor critic [2]
-
[x] Noisy Networks for Exploration [3]
-
[x] Proximal Policy Optimization Algorithms [4]
-
[x] Curiosity-driven Exploration by Self-supervised Prediction [5] (WIP)
1. Setup
Requirements
- python3.6
- gym-super-mario-bros
- OpenCV Python
- PyTorch
- tensorboardX
2. How to Train
Modify the parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
3. How to Eval
Modify the is_load_model
, is_render
parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
4. Loss/Reward Graph
It use just A2C(PAAC)
It use just ICM and no ext reward.(Curiosity-driven)
References
[1] Actor-Critic Algorithms
[2] Efficient Parallel Methods for Deep Reinforcement Learning
[3] Noisy Networks for Exploration
[4] Proximal Policy Optimization Algorithms
[5] Curiosity-driven Exploration by Self-supervised Prediction