DeepRL-Agents icon indicating copy to clipboard operation
DeepRL-Agents copied to clipboard

A3C Basic Doom: Loss Function

Open IbrahimSobh opened this issue 8 years ago • 1 comments

Hi

Our goal is to minimize the loss. Loss consists of three parts:

  • Value loss
  • Policy loss
  • Entropy (to encourage exploration)

As follows:


self.value_loss = 0.5 * tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value,[-1])))
self.entropy = - tf.reduce_sum(self.policy * tf.log(self.policy))
self.policy_loss = -tf.reduce_sum(tf.log(self.responsible_outputs)*self.advantages)
self.loss = 0.5 * self.value_loss + self.policy_loss - self.entropy * 0.01

Last line: self.loss = 0.5 * self.value_loss + self.policy_loss - self.entropy * 0.01

I think we do not need to multiply self.value_loss by 0.5 in the last line, correct?

IbrahimSobh avatar Mar 18 '17 10:03 IbrahimSobh

0.5 is the coefficient of loss of value, there is another post which introduced this.

However, why there is a minus sign in - self.entropy * 0.01, since the loss formula is

Loss = Loss_policy + c_value * Loss_value + c_entropy * Loss_entropy 

And

self.value_loss = 0.5 * tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value,[-1])))

Why 0.5 here is needed?

GoingMyWay avatar Jun 20 '17 10:06 GoingMyWay