Asynchronous-Methods-for-Deep-Reinforcement-Learning Training in process/core level parallelism

trafficstars

Hi @Zeta36

Great project! I'm trying to run some experiments with the code. It seems that currently the code uses threading with tensorflow, and from my observation, the training loop is not really in full parallel because of running on threads instead of processes. I think ideally, each learner should be on a different process to fully utilize a modern machine.

This might be relevant: http://stackoverflow.com/questions/34900246/tensorflow-passing-a-session-to-a-python-multiprocess

But it looks like bad news, I can't just spawn a bunch of processes and let them share the same tensorflow session. So maybe a distributed tensorflow session is what we need: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/distributed/index.md

Apr 21 '16 06:04 thisiscam

I've run this as well and as @thisiscam mentions, it doesn't appear to actually run in parallel with good utilization. When I run the program most python threads are at 5% core utilization except for one thread which is at 97% utilization, this means that collectively only about 2 cores are actually in use.

Dec 10 '16 19:12 ahundt

@thisiscam distributed tensorflow as per your link is across many physical machines networked together, before that approach is taken it is important to completely utilize the capabilities of a single machine.

Dec 10 '16 19:12 ahundt

The threading mechanism and queues is more likely to be the right way to go: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/threading_and_queues/index.md

Dec 10 '16 19:12 ahundt

@thisiscam I saw you made some changes in a branch here: https://github.com/thisiscam/Asynchronous-Methods-for-Deep-Reinforcement-Learning/tree/ale

But it looks like you forgot to add a file for some of the functions like load_ale() which is simply not present.

Dec 10 '16 20:12 ahundt

load_ale comes from python_ale_interface https://github.com/bbitmaster/ale_python_interface

However, from my experiments I have not yet tuned a good enough parameter that works. It might be due to some bug in the code

Dec 10 '16 21:12 thisiscam

Asynchronous-Methods-for-Deep-Reinforcement-Learning Asynchronous-Methods-for-Deep-Reinforcement-Learning copied to clipboard

Training in process/core level parallelism

Asynchronous-Methods-for-Deep-Reinforcement-Learning
Asynchronous-Methods-for-Deep-Reinforcement-Learning copied to clipboard