TRPO-TensorFlow
TRPO-TensorFlow copied to clipboard
kl.pen = 0
Hey,
I was running an implementation of your code, and it seems like the kl_pen is always zero. It seems like its because the oldlog_vars and log_vars are the same. How'd you get around that? Since if the two are the same then the gradient is zero, and then the hvp function fails because you end up with a div/0.