Youtube-Code-Repository icon indicating copy to clipboard operation
Youtube-Code-Repository copied to clipboard

Purpose of Passing New Frame into State Memory with Previous Action

Open ghost opened this issue 4 years ago • 0 comments

Hi Phil, huge fan of your work.. I have two questionsn regarding policy gradients TensorFlow for SpaceInvaders:

1.In the reinforce_cnn_tf.py and in the choose_action function there is a line:

probabilities = self.sess.run(self.actions, feed_dict={self.input: observation})[0]

Here 0 specifies that the action probability distribution is the first of the 4 probability distributions, if this is the case then your actions are taken based on the first frame or the 0th observation of the stacked_frames. Is that right?

  1. Assuming my first assumption is right. There is a line in the main_tf_reinforce_space_invaders.py file:

observation, reward, done, info = env.step(action) observation = preprocess(observation) stacked_frames = stack_frames(stacked_frames, observation, stack_size) agent.store_transition(observation, action, reward) (this one)

Here the new observation is getting stored with action taken based on the 0th observation in the stacked_frame, If this is the case why does this work while training the agent? Are the probability distributions when the observations are fed in different from the labels?

ghost avatar Mar 04 '20 14:03 ghost