batch-ppo GPU doesn't seem to work

I've set use_gpu = True, but the GPU useage is almost close to zero when running the code. When I look into tensorboard, it shows that all operations are assigned to CPU. Then I disable sess_config = tf.ConfigProto(allow_soft_placement=True) and force it running on GPU, the system console throws an error as: `INFO:tensorflow:Start a new run and write summaries and checkpoints to E:\Code\PythonScripts\DeepRL\BatchPPO\20180308T091941-pendulum. WARNING:tensorflow:Number of agents should divide episodes per update. 2018-03-08 09:19:41.315004: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-03-08 09:19:41.595863: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.1775 pciBusID: 0000:01:00.0 totalMemory: 2.00GiB freeMemory: 1.64GiB 2018-03-08 09:19:41.596493: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2) INFO:tensorflow:Graph contains 42003 trainable variables. 2018-03-08 09:19:57.811479: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2) Traceback (most recent call last): File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call return fn(*args) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1293, in _run_fn self._extend_graph() File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1354, in _extend_graph self._session, graph_def.SerializeToString(), status) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. Colocation Debug Info: Colocation group had the following types and devices: Switch: GPU CPU VariableV2: CPU Identity: CPU Assign: CPU RefSwitch: GPU CPU ScatterUpdate: CPU AssignAdd: CPU [[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 163, in tf.app.run() File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 145, in main for score in train(config, FLAGS.env_processes): File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 127, in train utility.initialize_variables(sess, saver, config.logdir) File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\scripts\utility.py", line 116, in initialize_variables tf.global_variables_initializer())) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run run_metadata_ptr) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run options, run_metadata) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. Colocation Debug Info: Colocation group had the following types and devices: Switch: GPU CPU VariableV2: CPU Identity: CPU Assign: CPU RefSwitch: GPU CPU ScatterUpdate: CPU AssignAdd: CPU [[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]

Caused by op 'ppo_temporary/episodes/Variable', defined at: File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 163, in tf.app.run() File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 145, in main for score in train(config, FLAGS.env_processes): File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 113, in train batch_env, config.algorithm, config) File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\scripts\utility.py", line 48, in define_simulation_graph algo = algo_cls(batch_env, step, is_training, should_log, config) File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\ppo\algorithm.py", line 78, in init template, len(batch_env), config.max_length, 'episodes') File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\ppo\memory.py", line 44, in init self._length = tf.Variable(tf.zeros(capacity, tf.int32), False) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\variables.py", line 213, in init constraint=constraint) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\variables.py", line 331, in _init_from_args name=name) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\state_ops.py", line 133, in variable_op_v2 shared_name=shared_name) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 926, in _variable_v2 shared_name=shared_name, name=name) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op op_def=op_def) File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. Colocation Debug Info: Colocation group had the following types and devices: Switch: GPU CPU VariableV2: CPU Identity: CPU Assign: CPU RefSwitch: GPU CPU ScatterUpdate: CPU AssignAdd: CPU [[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]`

It seems that tensorflow does not allow assign an int type variable on GPU.

Mar 08 '18 01:03 fengredrum

BTW, it runs on Windows 10, and the version of tensorflow is 1.4

Mar 08 '18 01:03 fengredrum

Hi @fengredrum. In case this is still an issue, could you try wrapping your network implementation in a with tf.device('/gpu:0') block?

Apr 10 '18 12:04 danijar

The neural network is assigned to GPU, which I've checked in TensorBoard. The problem occurs in agent/ppo/memory.py. Cause self._length is a int32 type variable. I try to initialize it as a tf.float32 type variable then using tf.to_int32 bypass this problem. The code works fine on CPU. However, When implemented on GPU, it seems doesn't learn anything. Maybe there are some elusive bug in tensorflow? LOL

Apr 24 '18 01:04 fengredrum

Thanks for providing more details. I don't think the replay buffer should be placed on GPU, since it can grow quite large, especially when training from pixel observations. All ops should default to CPU because of the with tf.device('/cpu:0') block in train.py, and the network an RNN states should be specifically assigned to GPU inside ppo.py. Is that not what is happening for you?

Apr 25 '18 12:04 danijar

I tried running the default pendulum trainer. When I turn use_gpu on, it freezes during step 0 with no error. It runs fine otherwise. TF runs fine with my GPU on other operations.

Tensorflow v1.8, Ubuntu 18.04, Nvidia GTX 1080, Cuda 9.0.

If my understanding is correct, Tensorflow will automatically give priority to GPU on supported ops. Perhaps the use_gpu option should be removed if it doesn't work.

Jul 01 '18 07:07 colinskow

I agree with your opinion, the transitions might be very large when training from pixel observations. I used to think storing them in GPU memory to alleviate the communication cost between CPU and GPU. However, It turns out that it may not be the optimal solution when training in a single machine. Thank you for your constructive view.

Jul 09 '18 02:07 fengredrum

@colinskow Could you try running without environment processes (--noenv_processes), please? When there is a crash in one of the processes it can cause the program to deadlock before anything is printed.

@fengredrum The collected episodes should be stored on CPU memory in almost all scenarios. Is this not the case? We have the config options batch_size and chunk_length if you don't want to train on the full batch of episodes in order to fit network activations on the GPU.

Jul 14 '18 09:07 danijar

Yes, it works exactly as your description. I'm implementing Batch-PPO based on my understanding. I've added several DL tricks on it to improve stability and performance. Still debugging, hope it can achieve or even beyond your score, LOL.

Jul 16 '18 02:07 fengredrum

I am currently trying to perform training using the GPU as-well, however I am experiencing different issues than those listed above. My issue is that at a certain step in training I believe when it is going to update the global network. When this happens and I run on CPU everything is fine, I get some kl cutoff prompts and then it continues. However when I run on GPU everything just stops, CPU usage goes to zero, GPU usage stays at zero and no prompts in the terminal or anything else happening (for over half an hour after which i gave up).

What I changed: I added my own custom environment and network

What I tried: --noenv_processes argument, did not change anything still no activity --switch to using default feed forward categorical, no change --checkout clean clone and run pendulum config with gpu enabled, after the first Phase train prompt for step 0 no more output and no more cpu or gpu activity.

What I'm running on: ubuntu 16.04 tensorflow 1.10 cuda 9.0

Aug 20 '18 18:08 Timen

batch-ppo batch-ppo copied to clipboard

GPU doesn't seem to work

batch-ppo
batch-ppo copied to clipboard