stable-baselines3 DQN does not learn after long training and performs the same as random moves

DQN does not learn after long training and performs the same as random moves

Open techboy-coder opened this issue 3 years ago • 1 comments

I have created a gym wrapper around a snake game I made. I am using stable baseline 3 to train a snake to play the game. I have verified that the gym environment is functional and supports the correct interface for sb3 (using from stable_baselines3.common.env_checker import check_env).

I am training the DQN (with 2 hidden layers of 64 each) for 400k steps. The DQN seems to learn slowly (ep_rew_mean at step 1: -446, at step 400k: -427; see tensorboard log for further details)

System Info:

OS: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Python: 3.7.13
Stable-Baselines3: 1.6.0
PyTorch: 1.12.0+cu113
GPU Enabled: True
Numpy: 1.21.6
Gym: 0.21.0

Complete code can be found on: https://gist.github.com/techboy-coder/795e9dc00b743901963c75ede56cf6eb

Reward:

Reward calculation idea:

If dead: Reward = -500
If alive: Reward = Score + [Change in score * 50 + previous change in score * 0.2 - 0.5 (penalty to avoid infinite circles...)]

Reward calculation python implementation:

self.delta = (snake.score - self.score) * 50 + self.delta * 0.2 - 0.05
self.score = snake.score
reward = self.delta + snake.score
self.dead = not snake.alive
if self.dead:
    reward = -500

steps = 400000
with ProgressBarManager(steps) as callback:
  model.learn(total_timesteps=steps, log_interval=200, callback = callback)

Evaluation of untrained model: mean_reward:-479.99 +/- 31.81
Evaluation of trained model: mean_reward:-429.42 +/- 93.32
Evaluation of model making random moves:mean_reward:-425.72 +/- 117.29

My question:

Why is the model not learning much (is performing similarly to a model making random moves) after 400k steps? I have played around with the learning rate, network size, observation size, but haven't seen any big improvements. Do I need to train longer?

Checklist

[X] I have read the documentation (required)
[X] I have checked that there is no similar issue in the repo (required)

Aug 04 '22 12:08 techboy-coder

Unfortunately, the maintainers don't have time to offer technical support. Check the links here to find other forums to ask this question. I also suggest that you carefully read the RL Tips and Tricks in the documentation, it could really help.

Aug 04 '22 14:08 qgallouedec

stable-baselines3 stable-baselines3 copied to clipboard

DQN does not learn after long training and performs the same as random moves

Checklist

stable-baselines3
stable-baselines3 copied to clipboard