slitherin
slitherin copied to clipboard
AI research environment for the game of Snake 🐍 .
Slitherin
AI research environment for the game of Snake written in Python 2.7. Part of the OpenAI - Request For Research 2.0.
Check out corresponding Medium articles:
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 1: Domain Specific Solvers)
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 2: General Purpose Solvers)
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 3: Genetic Evolution)
Table of Contents
- Usage
- Rules
-
Modes
-
Domain specific
- Shortest Path BFS
- Shortest Path DFS
- Longest path
- Hamilton
- DNN
- DNN Monte Carlo
-
General purpose
- Human
- Random
- Monte Carlo
- DNN Genetic Evolution
-
Domain specific
- Work in progress
Usage
- Clone the repo.
- Go to the project's root folder.
- Install required packages
pip install -r requirements.txt
. - Launch slitherin. I recommend starting with the help mode to see all available modes
python slitherin.py --help
.
Rules
- Snake has to move either forward, left or right.
- Snake dies when hits wall or itself.
- For every eaten fruit, snake's length increases by 1 and a new fruit is generated on a random unoccupied place.
Modes
All mode previews contain current score Mode Name (min/avg/max). All modes are benchmarked on a 12x12 grid.
Domain specific
Algorithms are using domain specific data like snake's position, direction, neighbors etc.
Shortest Path BFS
python slitherin.py --shortest_path_bfs
data:image/s3,"s3://crabby-images/276ba/276ba62b10471126fc64cec6a874737ece852651" alt=""
data:image/s3,"s3://crabby-images/33984/339847b066ee3feaa4a8f810edc3003e805f77b0" alt=""
Generates the shortest path from the snake’s head to the fruit using BFS algorithm.
Optimal performance during early stages, but as the snake grows, its body creates an unavoidable obstacle for the leading head.
Shortest Path DFS
python slitherin.py --shortest_path_dfs
data:image/s3,"s3://crabby-images/685ee/685eebb48438fc509f3a47ab4d87c75c9f7111a3" alt=""
data:image/s3,"s3://crabby-images/54dcc/54dcc3d65491583139140977945d9460c011e8ab" alt=""
Generates the shortest path from the snake’s head to the fruit using DFS algorithm.
Performs worse than BFS due to the graph’s cyclicity.
Longest path
python slitherin.py --longest_path
data:image/s3,"s3://crabby-images/66a67/66a675840784f85a8d329d1ebbc916fcdc3f14da" alt=""
data:image/s3,"s3://crabby-images/0f7a3/0f7a3992f688e0fd36be175d1e2a1eea53fc0c69" alt=""
Firstly, generates the shortest path (BFS) between the snake’s head and the fruit. Then for each pair of points in the path, tries to extend the distance between them with available actions.
Snake dies when its body is on a generated path.
Hamilton
python slitherin.py --hamilton
data:image/s3,"s3://crabby-images/84c35/84c35e66628c0fc31c6bfa7814bf86d07e31db5c" alt=""
data:image/s3,"s3://crabby-images/d8e23/d8e23abd847c5e9f1f007fa14f7c9755b2aacd53" alt=""
Generates a longest path between the snake’s head and its tail.
In the vast majority of the cases, such path covers the whole environment creating Hamiltonian path, thus solving the game of snake with a perfect score.
DNN
Each Deep Neural Net mode has a same model structure of:
- input layer with 5 neurons [action_vector, left_neighbor, forward_neighbor, right_neighbor, angle_to_fruit]
- hidden layer with 125 neurons (ReLU 6 activation)
- output layer with 1 neuron (value for a given action_vector)
python slitherin.py --deep_neural_net
python slitherin.py --deep_neural_net_trainer
data:image/s3,"s3://crabby-images/b3b05/b3b05d15c96b5420b44e2b6688cc470de4d6c9a3" alt=""
data:image/s3,"s3://crabby-images/2976a/2976a2a7f237a920bb91d5ecafd169a960d3991a" alt=""
Training phase consists of performing random gameplays followed by the evaluation and backpropagation of performed actions and its results.
Rewards:
- 0.7 for eating the fruit
- 0.1 for moving towards the fruit
- -0.2 for moving away of the fruit
- -1.0 for dying
As expected, DNN solver performs well in the early stages. Snake goes straight to the fruit and doesn't go into cycles. However as it gets longer, it starts to have problems with going around itself. With the current model structure (data about only the nearest surroundings), a snake doesn't indicate any sense of 'the whole environment orientation and position'
DNN Monte Carlo
python slitherin.py --deep_neural_net_monte_carlo
data:image/s3,"s3://crabby-images/de92c/de92c84f8c8e230dd72985ad1a4d465b2b3f7bea" alt=""
data:image/s3,"s3://crabby-images/88199/881990078bbb6b9b71434c6dcef09124a91a5289" alt=""
For each possible action, there is a DNN-driven gameplay generated. Gameplay with the highest score is chosen for an ultimate move.
Very slow and inefficient performance in the beginning, but favorable in the late stages. DNN-driven simulations allow the snake to choose relatively wise long-term moves.
General purpose
Algorithms are not using any domain specific data.
Human
python slitherin.py --human
Used for debug, development and fun:).
Random
python slitherin.py --random
data:image/s3,"s3://crabby-images/f1448/f144892d8a55fb96bae3b60f77c8af24db0288a4" alt=""
data:image/s3,"s3://crabby-images/37403/374033dec40c68de449bd59f4880a0a0b5deed47" alt=""
It's always good to start benchmarking against randomness (at least pseudo).
As expected, very low performance.
Monte Carlo
python slitherin.py --monte_carlo
data:image/s3,"s3://crabby-images/ff0e2/ff0e2db14abc4e08c1c5ff54b612a48d6f86bb63" alt=""
data:image/s3,"s3://crabby-images/1b4ab/1b4ab20e6e0f9e5c5e82dc6dc87877ca54ac593c" alt=""
For each move, performs a set of 1000 random run simulations. Then groups them by the initial action and finally picks the action that started gameplays with the highest average score.
Slow and weak performance.
DNN Genetic Evolution
python slitherin.py --deep_neural_net_genetic_evolution
python slitherin.py --deep_neural_net_genetic_evolution_trainer
data:image/s3,"s3://crabby-images/ad77d/ad77d7d4e6875667c68397d64edc2d017801eba4" alt=""
data:image/s3,"s3://crabby-images/f129a/f129ac165e8ca5f0a3a2956709a63c035f8ef9c4" alt=""
Initial population starts with random weights. Then in the selection phase, the top 0.1 of the population gets picked to the uniform crossover stage. In the crossover phase, parents are paired using roulette selection (the highest the score, the highest the probability of breeding). Finally, in the mutation phase, 0.01 of the weights of all offsprings are being mutated to the random values. Then we start again with a new population created fully by the newly bred offsprings. Above cycle is being repeated until convergence which happens usually around 25th generation and the average score of 22.
Performance is relatively satisfactory. Snake correctly learned that taking the shortest path to the fruit isn't a good solution in the late stages, but ultimately still gets trapped within its own body.
Work in progress
Multiplayer (multi-agent) version of slitherin is currently being developed.
Stay tuned!
Author
Greg (Grzegorz) Surma