deep-q-learning
deep-q-learning copied to clipboard
memory for state
thanks Keon for your great code! I have two questions: 1- What does [0] means in self.model.predict(next_state)[0] and return np.argmin(act_values[0])? Does this mean that first element of batch? 2-If in addition to batch, I need that my state is the state from K times before, what is the necessary change in order to do this? I want to send the state=state[i-k+1]....state[i-1],state[i] not only one state! How I can do this?
Thanks again
-
Keras Model API tell us that
model.predict(predict_batch)
returns a numpy array with an array of predictions for each element inpredict_batch
. Sincemodel.predict
is being called on a single array,next_state
then we want the first and only element in the prediction array. Hence'[0]'
Thanks for question 1- I did not understand how to do it? How to have a memory on state? in nature paper the memory is used and it is equal to 4! and this is in addition to using batch
@fi000 can you provide a link to the said paper.
-
self.model.predict(next_state)[0]
predicts on batch as @CarterEllsworth pointed out. It returns an array of predictions for each of the elements in the batch but since we're only predicting on one state element we only the first and only prediction, hence the[0]
. -
You could somehow normalize over the last
k
states as I've implemented here. You'll need to adjust the dimensions according to whatever best suits your task.
@pskrunner14 You can refer to paper "Playing Atari with Deep Reinforcement Learning" section 4.1 last sentences of first paragraph! I have an state with 5 inputs but I have a problem in giving for instance 4 states in a frame! How we could do this in this code?