Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions

PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer and N-step bootstrapping. Creating a new Rainbow-DQN version. This implementation allows it also to run and train on several environments in parallel!

Implementations

Baseline IQN Notebook
Script Version with all extensions: IQN The IQN Baseline in this repository is already a Double IQN version with target networks!

Extensions

Dueling IQN
Noisy layer
N-step bootstrapping
Munchausen RL
Parallel environments for faster training (wall clock time). For CartPole-v0 3 worker reduced training time to 1/3!

Train

With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!

To run the script version: python run.py -info iqn_run1

To run the script version on the Atari game Pong: python run.py -env PongNoFrameskip-v4 -info iqn_pong1

Other hyperparameter and possible inputs

To see the options: python run.py -h

-agent, choices=["iqn","iqn+per","noisy_iqn","noisy_iqn+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of IQN agent you want to train, default is IQN - baseline!
-env,  Name of the Environment, default = BreakoutNoFrameskip-v4
-frames, Number of frames to train, default = 10 mio
-eval_every, Evaluate every x frames, default = 250000
-eval_runs, Number of evaluation runs, default = 2
-seed, Random seed to replicate training runs, default = 1
-munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0
-bs, --batch_size, Batch size for updating the DQN, default = 8
-layer_size, Size of the hidden layer, default=512
-n_step, Multistep IQN, default = 1
-N, Number of quantiles, default = 8
-m, --memory_size, Replay memory size, default = 1e5
-lr, Learning rate, default = 2.5e-4
-g, --gamma, Discount factor gamma, default = 0.99
-t, --tau, Soft update parameter tat, default = 1e-3
-eps_frames, Linear annealed frames for Epsilon, default = 1 mio
-min_eps, Final epsilon greedy value, default = 0.01
-info, Name of the training run
-w, --worker, Number of parallel environments. Batch size increases proportional to number of worker. Not recommended to have more than 4 worker, default = 1
-save_model, choices=[0,1]  Specify if the trained network shall be saved or not, default is 0 - not saved!

Observe training results

tensorboard --logdir=runs

Dependencies

Trained and tested on:

Python 3.6 
PyTorch 1.4.0  
Numpy 1.15.2 
gym 0.10.11

CartPole Results

IQN and Extensions (default hyperparameter): alttext

Dueling IQN and Extensions (default hyperparameter): alttext

Atari Results

IQN and M-IQN comparison (only trained for 500000 frames ~ 140 min).

Hyperparameter:

frames 500000
eps_frames 75000
min_eps 0.025
eval_every 10000
lr 1e-4
t 5e-3
m 15000
N 32

alttext

Performance after 10 mio frames, score 258

ToDo:

Comparison plot for n-step bootstrapping (n-step bootstrapping with n=3 seems to give a strong boost in learning compared to one step bootstrapping, plots will follow)
Performance plot for Pong compared with Rainbow
adding Munchausen RL ☑

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Paper references:

Author

Sebastian Dittert

Feel free to use this code for your own projects or research. For citation:

@misc{IQN and Extensions,
  author = {Dittert, Sebastian},
  title = {Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/IQN}},
}

IQN-and-Extensions
IQN-and-Extensions copied to clipboard

Metadata

Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions

Implementations

Extensions

Train

Other hyperparameter and possible inputs

Observe training results

Dependencies

CartPole Results

Atari Results

ToDo:

Help and issues:

Paper references:

Author

← Metadata

Owner

Metadata

IQN-and-Extensions IQN-and-Extensions copied to clipboard

Metadata

Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions

Implementations

Extensions

Train

Other hyperparameter and possible inputs

Observe training results

Dependencies

CartPole Results

Atari Results

ToDo:

Help and issues:

Paper references:

Author

← Metadata

Owner

Metadata

IQN-and-Extensions
IQN-and-Extensions copied to clipboard