Implementation of IMPALA with Distributed Tensorflow

Information

These results are from only 32 threads.
A total of 32 CPUs were used, 4 environments were configured for each game type, and a total of 8 games were learned.
Tensorflow Implementation
Use DQN model to inference action
Use distributed tensorflow to implement Actor
Training with 1 day
Same parameter of paper

start learning rate = 0.0006
end learning rate = 0
learning frame = 1e6
gradient clip norm = 40
trajectory = 20
batch size = 32
reward clipping = -1 ~ 1

Dependency

tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python

Overall Schema

Model Architecture

How to Run

show start.sh
Learning 8 types of games at a time, one of which uses 4 environments.

Result

Video



Breakout	Pong	Seaquest	Space-Invader

Boxing	Star-Gunner	Kung-Fu	Demon

Plotting

abs_one abs_one

Compare reward clipping method

Video



abs_one	soft_asymmetric

Plotting




abs_one


soft_asymmetric

Is Attention Really Working?

Above Blocks are ignored.
Ball and Bar are attentioned.
Empty space are attentioned because of less trained.

Todo

[x] Only CPU Training method
[x] Distributed tensorflow
[x] Model fix for preventing collapsed
[x] Reward Clipping Experiment
[x] Parameter copying from global learner
[x] Add Relational Reinforcement Learning
[x] Add Action information to Model
[x] Multi Task Learning
[x] Add Recurrent Model
[x] Training on GPU, Inference on CPU

IMPALA-Distributed-Tensorflow
IMPALA-Distributed-Tensorflow copied to clipboard

Metadata

Implementation of IMPALA with Distributed Tensorflow

Information

Dependency

Overall Schema

Model Architecture

How to Run

Result

Video

Plotting

Compare reward clipping method

Video

Plotting

Is Attention Really Working?

Todo

Reference

← Metadata

Owner

Metadata

IMPALA-Distributed-Tensorflow IMPALA-Distributed-Tensorflow copied to clipboard

Metadata

Implementation of IMPALA with Distributed Tensorflow

Information

Dependency

Overall Schema

Model Architecture

How to Run

Result

Video

Plotting

Compare reward clipping method

Video

Plotting

Is Attention Really Working?

Todo

Reference

← Metadata

Owner

Metadata

IMPALA-Distributed-Tensorflow
IMPALA-Distributed-Tensorflow copied to clipboard