TD-Gammon

Features
Installation
How to interact with GNU Backgammon using Python Script?
Usage
- Train TD-Network
- Evaluate Agent(s)
- Web Interface
- Plot Wins
Backgammon OpenAI Gym Environment
Bibliography, sources of inspiration, related works
License

Features

PyTorch implementation of TD-Gammon [1].
Test the trained agents against an open source implementation of the Backgammon game, GNU Backgammon.
Play against a trained agent via web gui

Installation

I used Anaconda3, with Python 3.6.8 (I tested only with the following configurations).

Create the conda environment:

$ conda create --name tdgammon python=3.6
$ source activate tdgammon
(tdgammon) $ git clone https://github.com/dellalibera/td-gammon.git

Install the environment gym-backgammon:

(tdgammon) $ git clone https://github.com/dellalibera/gym-backgammon.git
(tdgammon) $ cd gym-backgammon
(tdgammon) $ pip install -e .

Install the dependencies pytorch v1.2:

(tdgammon) $ pip install torch torchvision
(tdgammon) $ pip install tb-nightly

(tdgammon) $ cd td-gammon/
(tdgammon) $ pip install -r requirements.txt

Without Anaconda Environment

If you don't use Anaconda environment, run the following commands:

git clone https://github.com/dellalibera/td-gammon.git
pip3 install -r td-gammon/requirements.txt
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .

If you don't use Anaconda environment, in the commands below replace python with python3.

GNU Backgammon

To play against gnubg, you have to install gnubg.
NOTE: I installed gnubg on Ubuntu 18.04 (running on a Virtual Machine), with Python 2.7 (see next section to see how to interact with GNU Backgammon).

On Ubuntu:

sudo apt-get install gnubg

How to interact with GNU Backgammon using Python Script?

I used an http server that runs on the Guest machine (Ubuntu), to receive commands and interact with the gnubg program.
In this way, it's possible to send commands from the Host machine (in my case MacOS).

The file bridge.py should be executed on the Guest Machine (the machine where gnubg is installed).

On Ubuntu:

gnubg -t -p /path/to/bridge.py

It runs the gnubg with the command-line instead of using the graphical interface (-t) and evaluates a Python code file and exits (-p).
For a list of parameters of gnubg, run gnubg --help.

The python script bridge.py creates an http server, running on localhost:8001.
If you want to modify the host and the port, change the following line in bridge.py:

if __name__ == "__main__":
    HOST = 'localhost' # <-- YOUR HOST HERE
    PORT = 8001  # <-- YOUR PORT HERE
    run(host=HOST, port=PORT)

The file td_gammon/gnubg/gnubg_backgammon.py sends messages/commands to gnubg and parses the response.

Usage

Run python /path/to/main.py --help for a list of parameters.

Train TD-Network

To train a neural network with a single layer with 40 hidden units, for 100000 games/episodes and save the model every 10000, run the following command:

(tdgammon) $ python /path/to/main.py train --save_path ./saved_models/exp1 --save_step 10000 --episodes 100000 --name exp1 --type nn --lr 0.1 --hidden_units 40

Run python /path/to/main.py train --help for a list of parameters available for training.

Evaluate Agent(s)

To evaluate an already trained models, you have to options: evaluate models to play against each other or evaluate one model against gnubg.
Run python /path/to/main.py evaluate --help for a list of parameters available for evaluation.

Agent vs Agent

To evaluate two model to play against each other you have to specify the path where the models are saved with the corresponding number of hidden units.

(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --hidden_units_agent1 40 --type nn --model_agent0 path/to/saved_models/agent0.tar --model_agent1 path/to/saved_models/agent1.tar

Agent vs gnubg

To evaluate one model to play against gnubg, first you have to run gnubg with the script bridge as input.
On Ubuntu (or where gnubg is installed)

gnubg -t -p /path/to/bridge.py

Then run (to play vs gnubg at intermediate level for 100 games):

(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --type nn --model_agent0 path/to/saved_models/agent0.tar vs_gnubg --difficulty beginner --host GNUBG_HOST --port GNUBG_PORT

The hidden units (--hidden_units_agent0) of the model must be same of the loaded model (--model_agent0).

Web Interface

You can play against a trained agent via a web gui:

(tdgammon) $ python /path/to/main.py gui --host localhost --port 8002 --model path/to/saved_models/agent0.tar --hidden_units 40 --type nn

Then navigate to http://localhost:8002 in your browser:

Web Interface

Run python /path/to/main.py gui --help for a list of parameters available about the web gui.

Plot Wins

Instead of evaluating the agent during training (it can require some time especially if you evaluate against gnubg - difficulty world_class), you can load all the saved models in a folder, and evaluate each model (saved at different time during training) against one or more opponents.
The models in the directory should be of the same type (i.e the structure of the network should be the same for all the models in the same folder).

To plot the wins against gnubg, run on Ubuntu (or where gnubg is installed):

gnubg -t -p /path/to/bridge.py

In the example below the trained model is going to be evaluated against gnubg on two different difficulties levels - beginner and advanced:`

(tdgammon) $ python /path/to/main.py plot --save_path /path/to/saved_models/myexp --hidden_units 40 --episodes 10 --opponent random,gnubg --dst /path/to/experiments --type nn --difficulty beginner,advanced --host GNUBG_HOST --port GNUBG_PORT

To visualize the plots:

(tdgammon) $ tensorboard --logdir=runs/path/to/experiment/ --host localhost --port 8001

Run python /path/to/main.py plot --help for a list of parameters available about plotting.

Backgammon OpenAI Gym Environment

For a detailed description of the environment: gym-backgammon.

Bibliography, sources of inspiration, related works

TD-Gammon and Temporal Difference Learning:
- [1] Practical Issues in Temporal Difference Learning
- Temporal Difference Learning and TD-Gammon
- Programming backgammon using self-teaching neural nets
- Implementaion Details TD-Gammon
- Chapter 9 Temporal-Difference Learning
- Implementation Details of the TD(λ) Procedure for the Case of Vector Predictions and Backpropagation
- Learning to Predict by the Methods of Temporal Differences
GNU Backgammon: https://www.gnu.org/software/gnubg/
Rules of Backgammon:
- www.bkgm.com/rules.html
- https://en.wikipedia.org/wiki/Backgammon
- Starting Position: http://www.bkgm.com/gloss/lookup.cgi?starting+position
- https://bkgm.com/faq/
Install GNU Backgammon on Ubuntu:
- https://ubuntuforums.org/showthread.php?t=2217668
- https://ubuntuforums.org/showthread.php?t=1506341
- https://www.reddit.com/r/backgammon/comments/5gpkov/installing_gnu_or_xg_on_linux/
How to use python to interact with gnubg: [Bug-gnubg] Documentation: Looking for documentation on python scripting
Other Implementation of the Backgammon OpenAI Gym Environment:
- https://github.com/edusta/gym-backgammon
Other Implementation of TD-Gammon:
- https://github.com/TobiasVogt/TD-Gammon
- https://github.com/millerm/TD-Gammon
- https://github.com/fomorians/td-gammon
How to setup your VMWare Fusion images to use static IP addresses on Mac OS X
- https://gist.github.com/pjkelly/1068716/6d19faa0122c0e1efe350e818bb8f4e8687ea1ab
PyTorch Tensorboard: https://pytorch.org/docs/stable/tensorboard.html

License

MIT

td-gammon
td-gammon copied to clipboard

Metadata

TD-Gammon

Table of Contents

Features

Installation

Without Anaconda Environment

GNU Backgammon

On Ubuntu:

How to interact with GNU Backgammon using Python Script?

On Ubuntu:

Usage

Train TD-Network

Evaluate Agent(s)

Agent vs Agent

Agent vs gnubg

Web Interface

Plot Wins

Backgammon OpenAI Gym Environment

Bibliography, sources of inspiration, related works

License

← Metadata

Owner

Metadata

td-gammon td-gammon copied to clipboard

Metadata

TD-Gammon

Table of Contents

Features

Installation

Without Anaconda Environment

GNU Backgammon

On Ubuntu:

How to interact with GNU Backgammon using Python Script?

On Ubuntu:

Usage

Train TD-Network

Evaluate Agent(s)

Agent vs Agent

Agent vs gnubg

Web Interface

Plot Wins

Backgammon OpenAI Gym Environment

Bibliography, sources of inspiration, related works

License

← Metadata

Owner

Metadata

td-gammon
td-gammon copied to clipboard