reinforcement-learning Test the policy in "Value Iteration" exercise

Test the policy in "Value Iteration" exercise

Open link2xt opened this issue 5 years ago • 1 comments

Otherwise value_iteration is allowed to return correct value function but incorrect policy

Jun 23 '19 23:06 link2xt

Note that I evaluate returned policy and compare its value, because there are multiple optimal policies.

Jun 23 '19 23:06 link2xt