Deep_reinforcement_learning_Course Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions

Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) ?

Open Meur-sault opened this issue 6 years ago • 0 comments

Hi Thomas,

(Since this issue got resolved without any proper answer, I'm submitting it again.) I don't understand that why we are doing tf.reduce_sum and multiple the network output to action.

self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1)

Why aren't we considering self.output as predicted Q value.

Jun 16 '19 13:06 Meur-sault

Deep_reinforcement_learning_Course Deep_reinforcement_learning_Course copied to clipboard

Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) ?

Deep_reinforcement_learning_Course
Deep_reinforcement_learning_Course copied to clipboard