garage
garage copied to clipboard
update documentation on how to use rnns with tf/torch[pending]
the error a contributor got when using the categoricalgrupolicy
with TRPO
on the tf
branch, computing backwards passes was
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'optimize/hx_plain/gradients_hx_plain/ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad_grad/ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad_grad':
Connecting to invalid output 78 of source node ConjugateGradientOptimizer/update_opt_mean_kl/gradients_constraint/policy_1/gru/rnn_2/while_grad/policy_1/gru/rnn_2/while_grad which has 78 outputs.
Try using tf.compat.v1.experimental.output_all_intermediates(True)
@krzentner
was able to fix the contributor's by adding the following argument to the optimizer of trpo:
optimizer_args=dict(hvp_approach=FiniteDifferenceHVP(
base_eps=1e-5))
is there a reason why we would need this? Is this specific to trpo, and if so, can we modify trpo to have this by default?
If CG optimizer can't be used with RNNs (I don't think that's actually the case), we should detect that and raise an error.