PeiYingjun
PeiYingjun
Exactly, I'm trying to rewrite the code
sorry, I mean I think it should be `kl_firstfixed = tf.reduce_sum(tf.stop_gradient( oldaction_dist) * tf.log(tf.stop_gradient(oldaction_dist + eps) / (oldaction_dist + eps))) / Nf`
All right, after a quick analysis, I think it' s reasonable to use the first definition of kl_first, yet I'm still confused about the losses, why do we try to...
> Thank you for your interest! We didn't update our recent version to our master branch. > You should go to dLSTM branch dlstm_a2c folder to check the new one....