IQN-and-Extensions
IQN-and-Extensions copied to clipboard
PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer, N-step bootstrapping, Dueling architecture and p...
Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions
PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer and N-step bootstrapping. Creating a new Rainbow-DQN version. This implementation allows it also to run and train on several environments in parallel!
Implementations
- Baseline IQN Notebook
- Script Version with all extensions: IQN The IQN Baseline in this repository is already a Double IQN version with target networks!
Extensions
- Dueling IQN
- Noisy layer
- N-step bootstrapping
- Munchausen RL
- Parallel environments for faster training (wall clock time). For CartPole-v0 3 worker reduced training time to 1/3!
Train
With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!
To run the script version:
python run.py -info iqn_run1
To run the script version on the Atari game Pong:
python run.py -env PongNoFrameskip-v4 -info iqn_pong1
Other hyperparameter and possible inputs
To see the options:
python run.py -h
-agent, choices=["iqn","iqn+per","noisy_iqn","noisy_iqn+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of IQN agent you want to train, default is IQN - baseline!
-env, Name of the Environment, default = BreakoutNoFrameskip-v4
-frames, Number of frames to train, default = 10 mio
-eval_every, Evaluate every x frames, default = 250000
-eval_runs, Number of evaluation runs, default = 2
-seed, Random seed to replicate training runs, default = 1
-munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0
-bs, --batch_size, Batch size for updating the DQN, default = 8
-layer_size, Size of the hidden layer, default=512
-n_step, Multistep IQN, default = 1
-N, Number of quantiles, default = 8
-m, --memory_size, Replay memory size, default = 1e5
-lr, Learning rate, default = 2.5e-4
-g, --gamma, Discount factor gamma, default = 0.99
-t, --tau, Soft update parameter tat, default = 1e-3
-eps_frames, Linear annealed frames for Epsilon, default = 1 mio
-min_eps, Final epsilon greedy value, default = 0.01
-info, Name of the training run
-w, --worker, Number of parallel environments. Batch size increases proportional to number of worker. Not recommended to have more than 4 worker, default = 1
-save_model, choices=[0,1] Specify if the trained network shall be saved or not, default is 0 - not saved!
Observe training results
tensorboard --logdir=runs
Dependencies
Trained and tested on:
Python 3.6 PyTorch 1.4.0 Numpy 1.15.2 gym 0.10.11
CartPole Results
IQN and Extensions (default hyperparameter):
Dueling IQN and Extensions (default hyperparameter):
Atari Results
IQN and M-IQN comparison (only trained for 500000 frames ~ 140 min).
Hyperparameter:
- frames 500000
- eps_frames 75000
- min_eps 0.025
- eval_every 10000
- lr 1e-4
- t 5e-3
- m 15000
- N 32
Performance after 10 mio frames, score 258
ToDo:
- Comparison plot for n-step bootstrapping (n-step bootstrapping with n=3 seems to give a strong boost in learning compared to one step bootstrapping, plots will follow)
- Performance plot for Pong compared with Rainbow
- adding Munchausen RL ☑
Help and issues:
Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.
Paper references:
Author
- Sebastian Dittert
Feel free to use this code for your own projects or research. For citation:
@misc{IQN and Extensions,
author = {Dittert, Sebastian},
title = {Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning and Extensions},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/BY571/IQN}},
}