TRPO-TensorFlow kl.pen = 0

kl.pen = 0

Open atishsawant opened this issue 6 years ago • 0 comments

Hey,

I was running an implementation of your code, and it seems like the kl_pen is always zero. It seems like its because the oldlog_vars and log_vars are the same. How'd you get around that? Since if the two are the same then the gradient is zero, and then the hvp function fails because you end up with a div/0.

Mar 19 '18 00:03 atishsawant

TRPO-TensorFlow TRPO-TensorFlow copied to clipboard

kl.pen = 0

TRPO-TensorFlow
TRPO-TensorFlow copied to clipboard