deep-q-learning memory for state

thanks Keon for your great code! I have two questions: 1- What does [0] means in self.model.predict(next_state)[0] and return np.argmin(act_values[0])? Does this mean that first element of batch? 2-If in addition to batch, I need that my state is the state from K times before, what is the necessary change in order to do this? I want to send the state=state[i-k+1]....state[i-1],state[i] not only one state! How I can do this?

Thanks again

Jun 26 '18 09:06 fi000

Keras Model API tell us that model.predict(predict_batch) returns a numpy array with an array of predictions for each element in predict_batch. Since model.predict is being called on a single array, next_state then we want the first and only element in the prediction array. Hence '[0]'

Jul 20 '18 17:07 CarterEllsworth

Thanks for question 1- I did not understand how to do it? How to have a memory on state? in nature paper the memory is used and it is equal to 4! and this is in addition to using batch

Aug 27 '18 08:08 fi000

@fi000 can you provide a link to the said paper.

self.model.predict(next_state)[0] predicts on batch as @CarterEllsworth pointed out. It returns an array of predictions for each of the elements in the batch but since we're only predicting on one state element we only the first and only prediction, hence the [0].
You could somehow normalize over the last k states as I've implemented here. You'll need to adjust the dimensions according to whatever best suits your task.

Aug 29 '18 14:08 pskrunner14

@pskrunner14 You can refer to paper "Playing Atari with Deep Reinforcement Learning" section 4.1 last sentences of first paragraph! I have an state with 5 inputs but I have a problem in giving for instance 4 states in a frame! How we could do this in this code?

Sep 03 '18 18:09 fi000

deep-q-learning deep-q-learning copied to clipboard

memory for state

deep-q-learning
deep-q-learning copied to clipboard