benchmarks
benchmarks copied to clipboard
Running distributed_all_reduce in only CPU mode
I am running distributed Tensorflow with GRPC protocol on only CPUs. I enabled distributed_all_reduce type of variable update with 'all_reduce_spec = xring':
I am wondering, if this mode is supposed to work for CPU only distributed runs. If yes, then does it need a different controller process in addition to workers.
I am getting errors such as: Unknown device: /job:worker/replica:0/task:2/device:CPU:0 all devices: CPU:0, /job:worker/replica:0/task:0/cpu:0, /job:worker/replica:0/task:0/device:CPU:0
I believe the tf_cnn_benchmark suite in general requires GPUs. The graph construction expects at least one GPU per worker.
I believe, the way to run on CPUs is to set num_gpu=1 and set the running device as cpu. Then the parameter_server update algorithm works perfectly fine. I have run lots of tests on CPUs with this. The new mode distributed_all_reduce is giving problems in execution.
It has not been tested running on CPU only. I think the problems may be significant in making it work, but if you want to try, look at tensorflow/contrib/all_reduce/python/all_reduce.py. The idea that it's working on GPUs is somewhat baked-in but maybe you can make it work without much change.
I will try it. Can you explain what "controller_host" is? Is it supposed to be a different node than workers?
See #64
Okay. Thank you. I will have time to take a look at it again in a few days.
@amathuri Were you able to get the distributed TF working for CPU only? I'd love to get your insight.
Thanks. -Tony
Yes. I have. I have tried it with parameter_server type of variable update and num_gpu=1. To get good performance on CPUs, Tensorflow needs to be built with MKL as backend and also tuning of num_intra_threads/num_inter_threads/ env OMP_NUM_THREADS are required. You may be able to install MKL Tensorflow wheel from here: https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available
What type of insights are you looking for?
Excellent. We're running a 4 node CPU cluster and don't seem to be getting it to scale properly. Could you email me at [email protected]? Thanks.
I Did. Thank you.
I found that running the script without any parameters can show the CPU performance