reinforcement-learning icon indicating copy to clipboard operation
reinforcement-learning copied to clipboard

policy_improvement() should be renamed to policy_iteration()

Open link2xt opened this issue 5 years ago • 0 comments

In the DP directory there is a Policy Iteration.ipynb. It contains function policy_improvement() which returns optimal policy and its value function. In the book this algorithm is called "Policy Iteration" (see p.80), while policy improvement is just a 3rd step inside of it.

link2xt avatar Jun 23 '19 20:06 link2xt