reinforcement-learning Provided policy_improvement() solution is not guaranteed to terminate

Provided policy_improvement() solution is not guaranteed to terminate

Open link2xt opened this issue 5 years ago • 1 comments

To set policy_stable variable, provided code checks whether the policy is changed. If there are multiple optimal policies, the policy may change infinitely even though optimal policy is already found.

See Exercise 4.4 of the 2018 edition in Sutton & Barto book, it explicitly points out this bug in the pseudocode.

Jun 23 '19 22:06 link2xt

Also see related issue #202 about naming of the function.

Jun 23 '19 22:06 link2xt

reinforcement-learning reinforcement-learning copied to clipboard

Provided policy_improvement() solution is not guaranteed to terminate

reinforcement-learning
reinforcement-learning copied to clipboard