why recalculate pi and v?

Open joyousrabbit opened this issue 8 years ago • 1 comments

Hello, in game_ac_network.py, def prepare_loss(self, entropy_beta), you have:

  # temporary difference (R-V) (input for policy)
  self.td = tf.placeholder("float", [None])

  value_loss = 0.5 * tf.nn.l2_loss(self.r - self.v)

But td == self.r-self.v, right?

So, why not use self.td directly instead of recalculating self.v ? Also for pi, why not pass it as placeholder?

Hope reply thanks.

May 04 '17 09:05 joyousrabbit

Because self.td is a fed in number(s) used in the policy gradient. You use self.r - self.v to calculate critic losses.

Oct 11 '17 23:10 MogicianWu