reinforcement-learning
reinforcement-learning copied to clipboard
policy_improvement() should be renamed to policy_iteration()
In the DP
directory there is a Policy Iteration.ipynb
. It contains function policy_improvement()
which returns optimal policy and its value function. In the book this algorithm is called "Policy Iteration" (see p.80), while policy improvement is just a 3rd step inside of it.