Deep_reinforcement_learning_Course icon indicating copy to clipboard operation
Deep_reinforcement_learning_Course copied to clipboard

Deep Q Learning Spaceinvaders

Open noobmaster29 opened this issue 6 years ago • 15 comments

I've trained the model for 50 total episodes. However, when I run the last code cell, the action is always the same. I've printed Qs and the action, and the action is always [0 0 0 0 0 0 1 0]. The agent never moves and just dies after 3 lives.

I tested the environment with: (Basically selects a random action) choice = np.random.rand(1,8) choice = choice[0] choice.tolist() choice = np.argmax(choice) print(choice) action = possible_actions[choice]

and the environment renders and the agent dies at around 200 points. So my installation is fine.

Any idea what I'm doing wrong?

noobmaster29 avatar Jan 02 '19 21:01 noobmaster29

I also logged more information on the training. The actions during training are different (agent is trying all the possible actions). Here is the information for the first 2 episodes:

Episode: 0 Total reward:: 50.0 Explore P: 0.9880 Training Loss 2.5707 Episode: 1 Total reward:: 110.0 Explore P: 0.9673 Training Loss 238.0061

After my second training attempt, the agent only performs [1 0 0 0 0 0 0 0].

Why is the agent only repeating one action when during training it is trying all the different actions?

noobmaster29 avatar Jan 02 '19 21:01 noobmaster29

LOL, third attempt and now it is only generating [0 0 1 0 0 0 0 0]. Is there something wrong with the inference?

noobmaster29 avatar Jan 02 '19 22:01 noobmaster29

@noobmaster29, how did you solve the problem mentioned above, i met with the same problem.

xiongsenlin avatar Jan 06 '19 12:01 xiongsenlin

I`m having the same problem. the agent always chooses the first action until it dies.

HemaZ avatar Jan 06 '19 14:01 HemaZ

space invaders environment actions sample returns something like this. array([0, 1, 0, 1, 1, 1, 1, 0], dtype=int8) so i think taking the argmax during the training is not correct.

HemaZ avatar Jan 06 '19 16:01 HemaZ

@xiongsenlin No unfortunately, I have not been able to resolve the issue.

@HemaZ Argmax should be correct. It takes the highest Q value and turns that action into 1 and everything else is 0. I'm not sure why there is more than one 1 in your action array.

noobmaster29 avatar Jan 06 '19 19:01 noobmaster29

I'm in the same boat. Also the model that comes with the project, which I'm assuming is pre-trained, also does not move.

Good tip on trying out the random agent and seeing how it performed.

NathanBWaters avatar Jan 15 '19 00:01 NathanBWaters

Yeah, I tried loading the pre-trained network but the agent still doesn't work. Maybe the author could help on getting the notebook to work.

noobmaster29 avatar Jan 15 '19 02:01 noobmaster29

actually, it's working for me now. but I don't remember what change I've made. maybe I've trained it for a little bit longer. check my implementation and weights. https://github.com/HemaZ/Deep-Reinforcement-Learning/tree/master/DQN

HemaZ avatar Jan 15 '19 15:01 HemaZ

I'll give it another shot.

NathanBWaters avatar Jan 17 '19 22:01 NathanBWaters

The problem is not enough trained network. The only valid actions are: 0 - fire, 6 - left, 7 - right. So if you got action [0 0 0 0 0 0 1 0] that is actually left action and agent never moves because it's at the left corner.

emstlk avatar Jan 31 '19 05:01 emstlk

I'll give it another shot.

Hey Nathan have you tried again?

ThomasZav avatar Mar 13 '19 10:03 ThomasZav

I did! Nothing came out of it. However, when I tried out the keras-rl library, which incorporated dueling double dqn, we had some pretty good results.

NathanBWaters avatar Mar 21 '19 20:03 NathanBWaters

You mean for space invaders? Will you open a repository for it

ThomasZav avatar Mar 24 '19 16:03 ThomasZav

Any News here ?

It looks like that the model only do somthing while training because of the rnd picking actions ni the predict_action function.

if i test the trained model nothing happens. (i trained it for > 50 episode)

AndreasMerz avatar Jul 02 '19 16:07 AndreasMerz