Tabular Reinforcement Learning with Algorithms Python

Python implementation of Tabular RL Algorithms in Sutton & Barto 2017 (Reinforcement Learning: An Introduction) Using only NumPy & basic Python data structures (list, tuple, set, and dictionary) to create environment & create algorithms

Algorithms learning from 4X4 Grid World Environment (From Sutton & Barto 2017, pp. 61)

Alt text

Tabular Reinforcement Learning Algorithms with NumPy

Alt text

Visualizations with Seaborn (Policy & Value function)

Alt text

0. MDP Environment (Chapter 3, Sutton & Barto 2017)

Introduction to gridworld environment

1. Dynamic Programming (Chapter 4, Sutton & Barto 2017)

Policy Evaluation and improvement
Policy Iteration
Value Iteration

2. Monte Carlo Methods (Chapter 5, Sutton & Barto 2017)

Monte Carlo Prediction
Monte Carlo Exploring Starts
On Policy Monte Carlo
Off Policy Monte Carlo

3. Temporal Difference Learning (Chapter 6, Sutton & Barto 2017)

TD Prediction
SARSA - On-policy Control
Q-learning - Off-policy Control
Double Q-learning - Off-policy Control

4. n-step Bootstrapping (Chapter 7, Sutton & Barto 2017)

n-step TD Prediction
n-step SARSA - On-policy Control
n-step Off-policy learning by Importance Sampling
n-step Off-policy learning without Importance Sampling

Tabular-RL-with-Python
Tabular-RL-with-Python copied to clipboard

Metadata

Tabular Reinforcement Learning with Algorithms Python

Algorithms learning from 4X4 Grid World Environment (From Sutton & Barto 2017, pp. 61)

Tabular Reinforcement Learning Algorithms with NumPy

Visualizations with Seaborn (Policy & Value function)

Contents

0. MDP Environment (Chapter 3, Sutton & Barto 2017)

1. Dynamic Programming (Chapter 4, Sutton & Barto 2017)

2. Monte Carlo Methods (Chapter 5, Sutton & Barto 2017)

3. Temporal Difference Learning (Chapter 6, Sutton & Barto 2017)

4. n-step Bootstrapping (Chapter 7, Sutton & Barto 2017)

← Metadata

Owner

Metadata

Tabular-RL-with-Python Tabular-RL-with-Python copied to clipboard

Metadata

Tabular Reinforcement Learning with Algorithms Python

Algorithms learning from 4X4 Grid World Environment (From Sutton & Barto 2017, pp. 61)

Tabular Reinforcement Learning Algorithms with NumPy

Visualizations with Seaborn (Policy & Value function)

Contents

0. MDP Environment (Chapter 3, Sutton & Barto 2017)

1. Dynamic Programming (Chapter 4, Sutton & Barto 2017)

2. Monte Carlo Methods (Chapter 5, Sutton & Barto 2017)

3. Temporal Difference Learning (Chapter 6, Sutton & Barto 2017)

4. n-step Bootstrapping (Chapter 7, Sutton & Barto 2017)

← Metadata

Owner

Metadata

Tabular-RL-with-Python
Tabular-RL-with-Python copied to clipboard