snake-ai-pytorch
snake-ai-pytorch copied to clipboard
training is slow, why?
Game 484 Score 11 Record: 69
after so many failures, only got a record 69, any way to improve it? or is it possible to achieve better?
increase tick speed so the games go faster
@dracularking include position of sname in inputs and use lstm
or turn the tick speed to 0, sure it will be hard on your CPU but it is worth it for results under 10 minutes, also tick speed 0 is for some reason faster than 10000000000000000000000000, just the way python works sometimes.
I've also improved it by adding a additional layer in the neural network, the improved code here:
self.Linear1 = nn.Linear(input_size, hidden_size)
self.Linear2 = nn.Linear(hidden_size, hidden_size)
self.Linear3 = nn.Linear(hidden_size, output_size) # NOTE: HAS TO BE WITHIN THE "model.py" FILE WITHIN THE SUPER INIT.
for the forward() function yes you do have to do everything manually, here is the code:
def forward(self, x):
x = F.relu(self.Linear1(x))
x = F.relu(self.Linear2(x))
x = self.Linear3(x) # Output layer, no ReLU here
return x
these are the parts of the file you need to edit to get a better model, do not add too many layers or overfitting may occur.
the entire code would be like such:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import os
class Linear_QNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.linear1 = nn.Linear(input_size, hidden_size)
self.linear2 = nn.Linear(hidden_size, hidden_size)
self.linear3 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x)) # ReLU activations introduce non-linearity.
x = self.linear3(x)
return x
def save(self, file_name='model.pth'):
model_folder_path = './model'
if not os.path.exists(model_folder_path):
os.makedirs(model_folder_path)
file_name = os.path.join(model_folder_path, file_name)
torch.save(self.state_dict(), file_name)
class QTrainer:
def __init__(self, model, lr, gamma):
self.lr = lr
self.gamma = gamma
self.model = model
self.optimizer = optim.Adam(model.parameters(), lr=self.lr)
self.criterion = nn.MSELoss()
def train_step(self, state, action, reward, next_state, done):
state = torch.tensor(state, dtype=torch.float)
next_state = torch.tensor(next_state, dtype=torch.float)
action = torch.tensor(action, dtype=torch.long)
reward = torch.tensor(reward, dtype=torch.float)
# (n, x)
if len(state.shape) == 1:
# (1, x)
state = torch.unsqueeze(state, 0)
next_state = torch.unsqueeze(next_state, 0)
action = torch.unsqueeze(action, 0)
reward = torch.unsqueeze(reward, 0)
done = (done, )
# 1: predicted Q values with current state
pred = self.model(state)
target = pred.clone()
for idx in range(len(done)):
Q_new = reward[idx]
if not done[idx]:
Q_new = reward[idx] + self.gamma * torch.max(self.model(next_state[idx]))
target[idx][torch.argmax(action[idx]).item()] = Q_new
# 2: Q_new = r + y * max(next_predicted Q value) -> only do this if not done
# pred.clone()
# preds[argmax(action)] = Q_new
self.optimizer.zero_grad()
loss = self.criterion(target, pred)
loss.backward()
self.optimizer.step()
You can copy and paste this code into model.py
The best way to improve the Snake AI however, is the following code:
model.to('cuda:0')
add this code right after the definition of 'model(state)'.
NOTE: You must have a cuda compatible GPU to use this method.
Probably you need to change a little bit the logic and the weights to work better in the long run, increase the layers or neurons in the model probably isn't the better choice, I implemented the code and notice that even the AI goes well arround 60-200 games it keeps making same mistakes as 1-10 games such looping in a corner.
I didn't improve the NN, I rebalance the weights and the aleatority param to make sure IA keeps improve and change the logic to make this kind of early issues didn't affect anymore in the long run.
The purpouse of this exercises is to introduce to these complex algorithms and techonlogies so we can search and find out studying and practicing, so I suggest you guys to revise the basics of ML and NN so you can understand better the unsupervisioned learning concepts and then by yourselves improve this code because it's very simple and wasn't made to give us the fish, but to incentive us to learn how to fish
Thanks, looking back at my code, it seems I should have been more focused on the rewarding system, I am seeing if I can do what is called a genetic algorithm that adjusts weights subtly, and perhaps changes the rewards very gradually, and it uses natural selection, making sure at least 5 original Worm workers will be saved and refered to in the future.