reinforcement-learning
reinforcement-learning copied to clipboard
Test the policy in "Value Iteration" exercise
Otherwise value_iteration is allowed to return correct value function but incorrect policy
Note that I evaluate returned policy and compare its value, because there are multiple optimal policies.