Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard
Purpose of Passing New Frame into State Memory with Previous Action
Hi Phil, huge fan of your work.. I have two questionsn regarding policy gradients TensorFlow for SpaceInvaders:
1.In the reinforce_cnn_tf.py and in the choose_action function there is a line:
probabilities = self.sess.run(self.actions, feed_dict={self.input: observation})[0]
Here 0 specifies that the action probability distribution is the first of the 4 probability distributions, if this is the case then your actions are taken based on the first frame or the 0th observation of the stacked_frames. Is that right?
- Assuming my first assumption is right. There is a line in the main_tf_reinforce_space_invaders.py file:
observation, reward, done, info = env.step(action) observation = preprocess(observation) stacked_frames = stack_frames(stacked_frames, observation, stack_size) agent.store_transition(observation, action, reward) (this one)
Here the new observation is getting stored with action taken based on the 0th observation in the stacked_frame, If this is the case why does this work while training the agent? Are the probability distributions when the observations are fed in different from the labels?