async-deep-flappybird
async-deep-flappybird copied to clipboard
Using Asynchronous Deep Reinforcement Learning to play Flappy Bird from pixel input.
Asynchronous Deep ReinFlappyBird
This repository contains an implementation of Asynchronous Advantage Actor-Critic (A3C) that teaches an agent to play Flappy Bird.
Performance
Coming soon!
Technical Details
For my tests, these are the training speeds when using a CPU (Intel Xeon E5620 2.40 GHz) or GPU (NVIDIA GTX1070).
FF | LSTM | |
---|---|---|
CPU | 57 steps/s | TBA steps/s |
GPU | 400 steps/s | 300 steps/s |
Settings
Here are some of the available flags you can set when you train an agent. For the full list, see a3c.py
.
Agent settings
-
mode
/[train, display, visualize]
- Which mode you want to activate when you start a session. -
use_gpu
/[True, False]
- If you have a/want to use GPU to speed up the training process. -
parallel_agent_size
- Number of parallel agents to use during training. -
action_size
- Numbers of available actions. -
agent_type
/[FF, LSTM]
- What type of A3C to train the agent with.
Training and Optimizer settings
The current settings are based on or borrowed from the [implemenentation] (https://github.com/miyosuda/async_deep_reinforce) by @miyosuda. They have not yet been optimized for Flappy Bird but rather used as is for now. Tell me settings that perform better than the current ones!
-
max_time_step - 40 000 000
- Maximum training steps. -
initial_alpha_low - -5
- LogUniform low limit for learning rate (represents x in 10^x). -
initial_alpha_high - -3
- LogUniform high limit for learning rate (represents x in 10^x). -
gamma - 0.99
- Discount factor for rewards. -
entropy_beta - 0.01
- Entropy regularization constant. -
grad_norm_clip - 40.0
- Gradient norm clipping. -
rmsp_alpha - 0.99
- Decay parameter for RMSProp. -
rmsp_epsilon - 0.1
- Epsilon parameter for RMSProp. -
local_t_max - 5
- Repeat step size.
Logging
-
log_level
- Log level[NONE, FULL]
-
average_summary
- How many episodes to average summary over.
Display
-
display_episodes
- Numbers of episodes to display. -
average_summary
- How many episodes to average summary over. -
display_log_level
- Display log level -NONE
prints end summary,MID
prints episode summary andFULL
prints the π-values, state value and reward for every state.[NONE, MID, FULL]
Getting started
To start a training session with the default parameters, run:
$ python a3c.py
To check your progress and possibly compare different experiments in real time, navigate to your async-deep-flappybird
folder and start tensorboard by running:
$ tensorboard --logdir summaries/
Enjoy!
Credit
A3C - The A3C implementation used is a modified version by @miyosuda.
Flappy Bird - The Flappy Bird implementation is based on a version by @yenchenlin with som minor adjustments.
—
2016, Babak Toghiani-Rizi