reinforcement-learning icon indicating copy to clipboard operation
reinforcement-learning copied to clipboard

Provided policy_improvement() solution is not guaranteed to terminate

Open link2xt opened this issue 5 years ago • 1 comments

To set policy_stable variable, provided code checks whether the policy is changed. If there are multiple optimal policies, the policy may change infinitely even though optimal policy is already found.

See Exercise 4.4 of the 2018 edition in Sutton & Barto book, it explicitly points out this bug in the pseudocode.

link2xt avatar Jun 23 '19 22:06 link2xt

Also see related issue #202 about naming of the function.

link2xt avatar Jun 23 '19 22:06 link2xt