tensorflow-rl
tensorflow-rl copied to clipboard
Can't test CartPole-v0 model trained with TRPO
I've tried to use TRPO to create a model for CartPole-v0
by following the instructions on your OpenAI Gym page, changing the command to the following to reflect the api changes since the score was submitted:
python main.py CartPole-v0 --alg_type trpo --td_lambda 1.0 --cg_damping .05 --episodes_per_batch 25 -n 2 -v 0 --arch FC --trpo_max_rollout 1000 --max_kl .05 --history_length 1 --frame_skip 1 --activation tanh --num_epochs 40
This seems to work, with training proceeding as expected and concluding successfully. However, when I try to evaluate the trained model by running
python main.py CartPole-v0 --alg_type trpo -n 1 --test --restore_checkpoint
I get the following error.
[2017-05-25 16:16:16,587] Error reported to Coordinator: <type 'exceptions.ValueError'>, Cannot feed value of shape (1, 4, 4) for Tensor u'policy_network_0/input:0', which has shape '(?, 84, 84, 4)'
Process TRPOLearner-1:
Traceback (most recent call last):
File "/home/abiolalapite/.pyenv/versions/2.7.13/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/abiolalapite/Code/ThirdParty/tensorflow-rl/algorithms/actor_learner.py", line 256, in run
self.test()
File "/home/abiolalapite/Code/ThirdParty/tensorflow-rl/algorithms/actor_learner.py", line 181, in test
a = self.choose_next_action(s)[0]
File "/home/abiolalapite/Code/ThirdParty/tensorflow-rl/algorithms/trpo_actor_learner.py", line 148, in choose_next_action
return self.policy_network.get_action(self.session, state)
File "/home/abiolalapite/Code/ThirdParty/tensorflow-rl/networks/policy_v_network.py", line 78, in get_action
self.logits], feed_dict=feed_dict)
File "/home/abiolalapite/.pyenv/versions/py2713/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/abiolalapite/.pyenv/versions/py2713/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 961, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 4, 4) for Tensor u'policy_network_0/input:0', which has shape '(?, 84, 84, 4)'
The TRPO learner currently isn't being checkpointed like the other algorithms, but I'll try to get a fix for that in tonight. Also note, at present you would need to explicitly provide any flags that modify the architecture or agent behavior at test time as well, which in this case would be --arch FC --history_length 1 --activation tanh --frame_skip 1
For now I would recommend using the --use_monitor flag during training for any solved environments since the primary objective in those environments is to minimize the amount of training episodes until solve.
Hey there, By any chance, do you still have plans for the following?
The TRPO learner currently isn't being checkpointed like the other algorithms, but I'll try to get a fix for that in tonight.
Thanks!
I do. I've been swamped with work recently so I forgot to take care of this but I should have time to fix it this weekend.