Reinforcement Learning Project

About

Using Q-learning, a model-free reinforcement learning technique (wiki), to find optimal action-selection policy to play Gomoku (or Five-in-a-Row). Build two Gomoku agents playing against each other.

Some terms:

Markov Decision Process(MDP): when making a decision to maximize future rewards, the information of the current state is just enough.
Q function: given an action and the state, output the "value" of this state-action pair.
Policy: a strategy to choose an action given the state, available actions and values of all state-action pairs.

Deep Q-learning: Train a deep neural network as our action-value Q function.

Details

Policy used: epsilon greedy action selection policy

Greedy action:

Since two agents facing against each other, the "greedy action" is the one balancing between
- suppressing the opponent's max Q value at the next step
- promoting self's max Q value the next time you play (Assume the opponent just reacts with the move with max Q value).

Experience replay:

Update the neural network with experience replay which stores (X, y) in a fixed-length list (memory) and random sample a batch of (X, y) to update the network periodically.

When Playing:

The agent will just choose the move with the largest state-action value (Q value).

Usage

Clone the repo and use the virtualenv:

git clone https://github.com/AaronYALai/Reinforcement_Learning_Project

cd Reinforcement_Learning_Project

virtualenv venv

source venv/bin/activate

Install all dependencies and train agents:

pip install -r requirements.txt

python train_agent.py

python agents_play.py

python human_play.py

Reinforcement_Learning_Project
Reinforcement_Learning_Project copied to clipboard

Metadata

Reinforcement Learning Project

About

Details

Usage

Reference: Q-learning with Neural Network

← Metadata

Owner

Metadata

Reinforcement_Learning_Project Reinforcement_Learning_Project copied to clipboard

Metadata

Reinforcement Learning Project

About

Details

Usage

Reference: Q-learning with Neural Network

← Metadata

Owner

Metadata

Reinforcement_Learning_Project
Reinforcement_Learning_Project copied to clipboard