IMPALA-Distributed-Tensorflow
IMPALA-Distributed-Tensorflow copied to clipboard
Implementation of IMPALA with Distributed Tensorflow
Information
- These results are from only 32 threads.
- A total of 32 CPUs were used, 4 environments were configured for each game type, and a total of 8 games were learned.
- Tensorflow Implementation
- Use DQN model to inference action
- Use distributed tensorflow to implement Actor
- Training with 1 day
- Same parameter of paper
start learning rate = 0.0006
end learning rate = 0
learning frame = 1e6
gradient clip norm = 40
trajectory = 20
batch size = 32
reward clipping = -1 ~ 1
Dependency
tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python
Overall Schema

Model Architecture

How to Run
- show start.sh
- Learning 8 types of games at a time, one of which uses 4 environments.
Result
Video
![]() |
![]() |
![]() |
![]() |
Breakout | Pong | Seaquest | Space-Invader |
![]() |
![]() |
![]() |
![]() |
Boxing | Star-Gunner | Kung-Fu | Demon |
Plotting
Compare reward clipping method
Video
![]() |
![]() |
abs_one | soft_asymmetric |
Plotting
![]() |
![]() |
abs_one |
![]() |
![]() |
soft_asymmetric |
Is Attention Really Working?
![]() |
- Above Blocks are ignored.
- Ball and Bar are attentioned.
- Empty space are attentioned because of less trained.
Todo
- [x] Only CPU Training method
- [x] Distributed tensorflow
- [x] Model fix for preventing collapsed
- [x] Reward Clipping Experiment
- [x] Parameter copying from global learner
- [x] Add Relational Reinforcement Learning
- [x] Add Action information to Model
- [x] Multi Task Learning
- [x] Add Recurrent Model
- [x] Training on GPU, Inference on CPU