Self-Imitation-Learning with A2C

This is the pytorch version of the A2C + SIL - which is basiclly the same as the openai baselines. The paper could be found Here.

TODO List

[ ] Add PPO with SIL
[ ] Add more results

Requirements

python-3.5.2
openai-baselines
pytorch-0.4.0

Installation

Install OpenAI Baselines (Need to use the previous version of openai-baselines, will solve in the future.)

# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
git checkout 366f486
pip install -e .

How to use the code

Train the network:

python train.py --env-name 'PongNoFrameskip-v4' --cuda (if you have the GPU)

Test the network:

python demo.py --env-name 'PongNoFrameskip-v4'

You could also try the A2C algorithm without SIL by adding flag --no-sil:

python train.py --env-name 'PongNoFrameskip-v4' --cuda --no-sil

Training Performance

Because of time, I just run Pong with 2 million steps. The results of MontezumaRevenge will be uploaded later! Scheme
Another results for the Freeway which is correspond with the original paper.
freeway

Demo: FreewayNoFrameskip-v4

freewaydemo

Acknowledgement

@junhyukoh for original code

self-imitation-learning-pytorch
self-imitation-learning-pytorch copied to clipboard

Metadata

Self-Imitation-Learning with A2C

TODO List

Requirements

Installation

How to use the code

Training Performance

Demo: FreewayNoFrameskip-v4

Acknowledgement

← Metadata

Owner

Metadata

self-imitation-learning-pytorch self-imitation-learning-pytorch copied to clipboard

Metadata

Self-Imitation-Learning with A2C

TODO List

Requirements

Installation

How to use the code

Training Performance

Demo: FreewayNoFrameskip-v4

Acknowledgement

← Metadata

Owner

Metadata

self-imitation-learning-pytorch
self-imitation-learning-pytorch copied to clipboard