TorchingUp

Why TorchingUp Exists

TorchingUp provides minimal implementations of common Reinforcement Learning algorithms written in PyTorch. It is meant to complement OpenAI’s SpinningUp repository, which contains similar algorithms implemented in Tensorflow.

The repository is built with a pedagogical mindset - the point is to help you learn RL as efficiently as possible. If you’ve been following the tutorials on SpinningUp, then TorchingUp is a natural way to start implementing common RL algorithms from scratch on your own.

To keep you focused on what’s important - learning - we follow a set of code design principles (see below) that are consistent across the various algorithms.

Installation

Installation is simple. The code is compatible with Python 3 and has minimal dependencies. Set up a virtual environment called torchingup with conda or virtualenv, and run

pip install -r requirements.txt

to install the dependencies. If you don’t have PyTorch installed, you’ll need to do that first. Also, note that the MuJoCo environments such as InvertedPendulum require you to have a MuJoCo license. If you’re a student, it’s free. Otherwise, you can sign up for the free trial. If you don’t have a license, that’s OK too - just use a different gym environment.

Once you have everything set up, you can run

python algos/{algo_name}/{algo_name}.py

to run the code. For example, if you’re running the DQN, type

python algos/dqn/dqn.py

in your command line.

What’s Included

Currently TorchingUp supports the following algorithms

Deep Q Network (Off-Policy)
Deep Deterministic Policy Gradient (Off-Policy)
REINFORCE (On-Policy)
Vanilla Policy Gradient (On-Policy)
Proximal Policy Optimization (On-Policy)

With support for TRPO, TD3, ACER, Soft Actor Critic, and Hindsight Experience Replay coming up.

Code Design

Reinforcement Learning (and more generally Deep Learning) algorithms are easily susceptible to bugs. The most common type of bug is silent - your code runs but the agent doesn’t learn anything. You then spend hours or days backtracking to find the mistake. To minimize these types of bug-hunting digressions, this repository structures code in a principled way.

Split File Structure - `{algo_name}.py` and `core.py`

Each algorithm is split into an {algo_name}.py file (e.g. dqn.py for DQNs) and core.py file. The code essential to understanding the algorithm is in the {algo_name} file while core.py contains auxiliary utility functions.

{algo_name}.py - code fundamental to understanding the algorithm. It contains the following structure:
Initialization - initialize environment, networks, and replay buffer
Gradient Update Rule - define a gradient update function (e.g. dqn_update()). This is the most important part of the algorithm - it defines the optimization step.
Training Loop - this loop collects experience, implements the update rule according to the algorithm, and logs the output at each epoch
core.py - contains utility classes and functions needed to implement the algorithm, such as:
Neural Network classes (e.g. Q network, Policy network, Value network)
Replay Buffer classes
Environment Wrappers
Loggers
Miscellaneous utilities (e.g. hard and soft updates for off-policy target networks)

Hyperparameters Stored in `config.json`

Since RL algorithms can have many hyperparameters, it’s important to keep them in one place separate from everything else. Each algorithm has an accompanying config.json file with hyperparameters. These are then loaded at the end of the {algo_name}.py file. When you play with hyperparameters, just edit the config.json file and run the algorithm.

The Gradient Update Function

Each file has a standalone gradient update function. The reason is that the optimization step is the single most important piece of any RL algorithm. Pay close attention to what’s happening in this function to understand how the algorithm works.

Modular, Shallow Classes

Note that all of the classes are modular and shallow. Instead implementing a master Agent class with tons of functionality, we keep our classes

Shallow - at most one level of inheritance and preferably no inheritance at all
Modular - each class achieves one thing

This practice keeps the code clean, minimal, with a lot more transparency when debugging than you’d get otherwise.

Clear Documentation for Each Algorithm

Finally, each algorithm has a README.md file that provides a concise explanation of the algorithm. Every RL algorithm can be implemented in a number of ways that differ in subtle ways. Not knowing which version is being implemented is a common roadblock for efficient learning, which makes clear documentation an important part of any RL repository.

That’s about it! Enjoy, Misha Laskin

torchingup
torchingup copied to clipboard

Metadata

TorchingUp

Why TorchingUp Exists

Installation

What’s Included

Code Design

Split File Structure - `{algo_name}.py` and `core.py`

Hyperparameters Stored in `config.json`

The Gradient Update Function

Modular, Shallow Classes

Clear Documentation for Each Algorithm

← Metadata

Owner

Metadata

torchingup torchingup copied to clipboard

Metadata

TorchingUp

Why TorchingUp Exists

Installation

What’s Included

Code Design

Split File Structure - {algo_name}.py and core.py

Hyperparameters Stored in config.json

The Gradient Update Function

Modular, Shallow Classes

Clear Documentation for Each Algorithm

← Metadata

Owner

Metadata

torchingup
torchingup copied to clipboard

Split File Structure - `{algo_name}.py` and `core.py`

Hyperparameters Stored in `config.json`