trpo About kl_firstfixed

About kl_firstfixed

Open PeiYingjun opened this issue 6 years ago • 2 comments

thanks for implementation of trpo, there exist some details that do not make sense to me so far I can't see why kl_firstfixed is defined as following kl_firstfixed = tf.reduce_sum(tf.stop_gradient( action_dist_n) * tf.log(tf.stop_gradient(action_dist_n + eps) / (action_dist_n + eps))) / Nf seems that we didn't make use of anything of oldaction_dist shouldn't it be kl_firstfixed = tf.reduce_sum(tf.stop_gradient( oldaction_dist) * tf.log(tf.stop_gradient(oldaction_dist + eps) / (action_dist_n + eps))) / Nf? besides, why does losses contain the entropy of action_dist_n? why must it be minimized?

Aug 14 '18 17:08 PeiYingjun

sorry, I mean I think it should be kl_firstfixed = tf.reduce_sum(tf.stop_gradient( oldaction_dist) * tf.log(tf.stop_gradient(oldaction_dist + eps) / (oldaction_dist + eps))) / Nf

Aug 14 '18 17:08 PeiYingjun

All right, after a quick analysis, I think it' s reasonable to use the first definition of kl_first, yet I'm still confused about the losses, why do we try to minimize three values?

Aug 15 '18 05:08 PeiYingjun

trpo trpo copied to clipboard

About kl_firstfixed

trpo
trpo copied to clipboard