rl icon indicating copy to clipboard operation
rl copied to clipboard

Generic reinforcement learning codebase in TensorFlow

FOR.ai Reinforcement Learning Codebase status DOI Build Status

Modular codebase for reinforcement learning models training, testing and visualization.

Contributors: Bryan M. Li, Alexander Cowen-Rivers, Piotr Kozakowski, David Tao, Siddhartha Rao Kamalakara, Nitarshan Rajkumar, Hariharan Sezhiyan, Sicong Huang, Aidan N. Gomez

Features

  • Agents: DQN, Vanilla Policy Gradient, DDPG, PPO
  • Environments:
    • OpenAI Gym
      • support both Discrete and Box environments
      • render (--render) and save (--record_video) environment replay
    • OpenAI Atari
    • OpenAI ProcGen
  • Model-free asynchronous training (--num_workers)
  • Memory replay: Simple, Proportional Prioritized Experience Replay
  • Modularized
    • hyper-parameters setting (--hparams)
    • action functions)
    • compute gradient functions
    • advantage estimation
    • learning rate schemes

Example for recorded envrionment on various RL agents.

MountainCar-v0 Pendulum-v0 VideoPinball-v0 procgen-coinrun-v0
MountainCar-v0 Pendulum-v0 VideoPinball-v0 Tennis-v0

Requirements

It is recommended to install the codebase in a virtual environment (virtualenv or conda).

Quick install

Configure use_gpu and (if on OSX) mac_package_manager (either macports or homebrew) params in setup.sh, then run it as

sh setup.sh

Manual setup

You need to install the following for your system:

Quick Start

# start training
python train.py --sys ... --hparams ... --output_dir ...
# run tensorboard
tensorboard --logdir ...
# test agnet
python train.py --sys ... --hparams ... --output_dir ... --test_only --render

Hyper-parameters

Check available flags with --help, defaults.py for default hyper-parameters, and check hparams/dqn.py agent specific hyper-parameters examples.

  • hparams: Which hparams to use, defined under rl/hparams
  • sys: Which system environment to use.
  • env: Which RL environment to use.
  • output_dir: The directory for model checkpoints and TensorBoard summary.
  • train_steps:, Number of steps to train the agent.
  • test_episodes: Number of episodes to test the agent.
  • eval_episodes: Number of episodes to evaluate the agent.
  • test_only: Test agent without training.
  • copies: Number of independent training/testing runs to do.
  • render: Render game play.
  • record_video: Record game play.
  • num_workers, number of workers.

Documentation

More detailed documentation can be found here.

Contributing

We'd love to accept your contributions to this project. Please feel free to open an issue, or submit a pull request as necessary. Contact us [email protected] for potential collaborations and joining FOR.ai.