td-gammon
td-gammon copied to clipboard
TD-Gammon implementation
TD-Gammon
Table of Contents
- Features
- Installation
- How to interact with GNU Backgammon using Python Script?
-
Usage
- Train TD-Network
- Evaluate Agent(s)
- Web Interface
- Plot Wins
- Backgammon OpenAI Gym Environment
- Bibliography, sources of inspiration, related works
- License
Features
- PyTorch implementation of TD-Gammon [1].
- Test the trained agents against an open source implementation of the Backgammon game, GNU Backgammon.
- Play against a trained agent via web gui
Installation
I used Anaconda3
, with Python 3.6.8
(I tested only with the following configurations).
Create the conda environment:
$ conda create --name tdgammon python=3.6
$ source activate tdgammon
(tdgammon) $ git clone https://github.com/dellalibera/td-gammon.git
Install the environment gym-backgammon
:
(tdgammon) $ git clone https://github.com/dellalibera/gym-backgammon.git
(tdgammon) $ cd gym-backgammon
(tdgammon) $ pip install -e .
Install the dependencies pytorch v1.2
:
(tdgammon) $ pip install torch torchvision
(tdgammon) $ pip install tb-nightly
or
(tdgammon) $ cd td-gammon/
(tdgammon) $ pip install -r requirements.txt
Without Anaconda Environment
If you don't use Anaconda environment, run the following commands:
git clone https://github.com/dellalibera/td-gammon.git
pip3 install -r td-gammon/requirements.txt
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .
If you don't use Anaconda environment, in the commands below replace python
with python3
.
GNU Backgammon
To play against gnubg
, you have to install gnubg
.
NOTE: I installed gnubg
on Ubuntu 18.04
(running on a Virtual Machine), with Python 2.7
(see next section to see how to interact with GNU Backgammon).
On Ubuntu:
sudo apt-get install gnubg
How to interact with GNU Backgammon using Python Script?
I used an http server
that runs on the Guest machine (Ubuntu), to receive commands and interact with the gnubg
program.
In this way, it's possible to send commands from the Host machine (in my case MacOS
).
The file bridge.py
should be executed on the Guest Machine (the machine where gnubg
is installed).
On Ubuntu:
gnubg -t -p /path/to/bridge.py
It runs the gnubg
with the command-line instead of using the graphical interface (-t
) and evaluates a Python code file and exits (-p
).
For a list of parameters of gnubg
, run gnubg --help
.
The python script bridge.py
creates an http server
, running on localhost:8001
.
If you want to modify the host and the port, change the following line in bridge.py
:
if __name__ == "__main__":
HOST = 'localhost' # <-- YOUR HOST HERE
PORT = 8001 # <-- YOUR PORT HERE
run(host=HOST, port=PORT)
The file td_gammon/gnubg/gnubg_backgammon.py
sends messages/commands to gnubg
and parses the response.
Usage
Run python /path/to/main.py --help
for a list of parameters.
Train TD-Network
To train a neural network with a single layer with 40
hidden units, for 100000
games/episodes and save the model every 10000
, run the following command:
(tdgammon) $ python /path/to/main.py train --save_path ./saved_models/exp1 --save_step 10000 --episodes 100000 --name exp1 --type nn --lr 0.1 --hidden_units 40
Run python /path/to/main.py train --help
for a list of parameters available for training.
Evaluate Agent(s)
To evaluate an already trained models, you have to options: evaluate models to play against each other or evaluate one model against gnubg
.
Run python /path/to/main.py evaluate --help
for a list of parameters available for evaluation.
Agent vs Agent
To evaluate two model to play against each other you have to specify the path where the models are saved with the corresponding number of hidden units.
(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --hidden_units_agent1 40 --type nn --model_agent0 path/to/saved_models/agent0.tar --model_agent1 path/to/saved_models/agent1.tar
Agent vs gnubg
To evaluate one model to play against gnubg
, first you have to run gnubg
with the script bridge
as input.
On Ubuntu (or where gnubg
is installed)
gnubg -t -p /path/to/bridge.py
Then run (to play vs gnubg
at intermediate level for 100 games):
(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --type nn --model_agent0 path/to/saved_models/agent0.tar vs_gnubg --difficulty beginner --host GNUBG_HOST --port GNUBG_PORT
The hidden units (--hidden_units_agent0
) of the model must be same of the loaded model (--model_agent0
).
Web Interface
You can play against a trained agent via a web gui:
(tdgammon) $ python /path/to/main.py gui --host localhost --port 8002 --model path/to/saved_models/agent0.tar --hidden_units 40 --type nn
Then navigate to http://localhost:8002
in your browser:
Run python /path/to/main.py gui --help
for a list of parameters available about the web gui.
Plot Wins
Instead of evaluating the agent during training (it can require some time especially if you evaluate against gnubg
- difficulty world_class
), you can load all the saved models in a folder, and evaluate each model (saved at different time during training) against one or more opponents.
The models in the directory should be of the same type (i.e the structure of the network should be the same for all the models in the same folder).
To plot the wins against gnubg
, run on Ubuntu (or where gnubg
is installed):
gnubg -t -p /path/to/bridge.py
In the example below the trained model is going to be evaluated against gnubg
on two different difficulties levels - beginner
and advanced
:`
(tdgammon) $ python /path/to/main.py plot --save_path /path/to/saved_models/myexp --hidden_units 40 --episodes 10 --opponent random,gnubg --dst /path/to/experiments --type nn --difficulty beginner,advanced --host GNUBG_HOST --port GNUBG_PORT
To visualize the plots:
(tdgammon) $ tensorboard --logdir=runs/path/to/experiment/ --host localhost --port 8001
Run python /path/to/main.py plot --help
for a list of parameters available about plotting.
Backgammon OpenAI Gym Environment
For a detailed description of the environment: gym-backgammon
.
Bibliography, sources of inspiration, related works
- TD-Gammon and Temporal Difference Learning:
- [1] Practical Issues in Temporal Difference Learning
- Temporal Difference Learning and TD-Gammon
- Programming backgammon using self-teaching neural nets
- Implementaion Details TD-Gammon
- Chapter 9 Temporal-Difference Learning
- Implementation Details of the TD(λ) Procedure for the Case of Vector Predictions and Backpropagation
-
Learning to Predict by the Methods of Temporal Differences
- GNU Backgammon: https://www.gnu.org/software/gnubg/
- Rules of Backgammon:
- www.bkgm.com/rules.html
- https://en.wikipedia.org/wiki/Backgammon
- Starting Position: http://www.bkgm.com/gloss/lookup.cgi?starting+position
- https://bkgm.com/faq/
- Install GNU Backgammon on Ubuntu:
- https://ubuntuforums.org/showthread.php?t=2217668
- https://ubuntuforums.org/showthread.php?t=1506341
- https://www.reddit.com/r/backgammon/comments/5gpkov/installing_gnu_or_xg_on_linux/
- How to use python to interact with
gnubg
: [Bug-gnubg] Documentation: Looking for documentation on python scripting
- Other Implementation of the Backgammon OpenAI Gym Environment:
- https://github.com/edusta/gym-backgammon
- https://github.com/edusta/gym-backgammon
- Other Implementation of TD-Gammon:
- https://github.com/TobiasVogt/TD-Gammon
- https://github.com/millerm/TD-Gammon
- https://github.com/fomorians/td-gammon
- How to setup your VMWare Fusion images to use static IP addresses on Mac OS X
- https://gist.github.com/pjkelly/1068716/6d19faa0122c0e1efe350e818bb8f4e8687ea1ab
- https://gist.github.com/pjkelly/1068716/6d19faa0122c0e1efe350e818bb8f4e8687ea1ab
- PyTorch Tensorboard: https://pytorch.org/docs/stable/tensorboard.html