dissecting-reinforcement-learning issues

Results 4 dissecting-reinforcement-learning issues

Sort by recently updated

Part.1 Modified Policy Iteration with Simplified Bellman Equation and Linear Algebra Policy Evaluation Infinite Loop

Hello, I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate. The iterative policy evaluation with the standard policy iteration program returns the correct...

CesarAndresRojas

mdp linear algebra approach cannot stop

This is an excellent example. However, when I tried the linear algebra approach in the mdp post, the while loop cannot stop.

zdarktknight

Two undefined variables

In the `setPosition` [function](https://github.com/mpatacchiola/dissecting-reinforcement-learning/blob/c25b3a4708db0567e0ecbeab48ba0aac6d5395cd/src/4/gridworld.py#L98) function at [line 102](https://github.com/mpatacchiola/dissecting-reinforcement-learning/blob/c25b3a4708db0567e0ecbeab48ba0aac6d5395cd/src/4/gridworld.py#L102) there are two undefined variables (`tot_row` and `tot_col`).

DoDzilla-ai

Part 3, TD(lambda): trace_matrix should be reset to zeroes at the beginning of each epoch

I believe that in part 3, TD(lambda), the trace_matrix should be reset to zeros at the beginning of each epoch. Otherwise the utility of a state may be updated even...

johanwiden

dissecting-reinforcement-learning
dissecting-reinforcement-learning copied to clipboard

Metadata

Part.1 Modified Policy Iteration with Simplified Bellman Equation and Linear Algebra Policy Evaluation Infinite Loop

mdp linear algebra approach cannot stop

Two undefined variables

Part 3, TD(lambda): trace_matrix should be reset to zeroes at the beginning of each epoch

← Metadata

Owner

Metadata

dissecting-reinforcement-learning dissecting-reinforcement-learning copied to clipboard

Metadata

Part.1 Modified Policy Iteration with Simplified Bellman Equation and Linear Algebra Policy Evaluation Infinite Loop

mdp linear algebra approach cannot stop

Two undefined variables

Part 3, TD(lambda): trace_matrix should be reset to zeroes at the beginning of each epoch

← Metadata

Owner

Metadata

dissecting-reinforcement-learning
dissecting-reinforcement-learning copied to clipboard