dissecting-reinforcement-learning
dissecting-reinforcement-learning copied to clipboard
Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog
Hello, I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate. The iterative policy evaluation with the standard policy iteration program returns the correct...
This is an excellent example. However, when I tried the linear algebra approach in the mdp post, the while loop cannot stop.
In the `setPosition` [function](https://github.com/mpatacchiola/dissecting-reinforcement-learning/blob/c25b3a4708db0567e0ecbeab48ba0aac6d5395cd/src/4/gridworld.py#L98) function at [line 102](https://github.com/mpatacchiola/dissecting-reinforcement-learning/blob/c25b3a4708db0567e0ecbeab48ba0aac6d5395cd/src/4/gridworld.py#L102) there are two undefined variables (`tot_row` and `tot_col`).
I believe that in part 3, TD(lambda), the trace_matrix should be reset to zeros at the beginning of each epoch. Otherwise the utility of a state may be updated even...