benchmarks
benchmarks copied to clipboard
data parallelism about benchmark distributed training
How benchmark achieves data parallelism in distributed training. It seems that there is no tf.dataset.shard in the code data_utils.py. How to guarantee that different workers get different data at the one step? Any help is appreciated.
There is no guarantee that different workers get different data in one step. Different workers get some random data in one step so they are usually different.
What @ppwwyyxx said is correct. We do shuffle the data with a buffer size of 10,000, but it's likely that training is suboptimial because we ignore shift_ratio.
Unfortunately, we currently aren't actively working on distributed performance in tf_cnn_benchmarks, and this probably will not be fixed until we start.
@reedwm Hope you go on distributed performance, now tf.contrib.distribute develop quickly, this repo should rush now
@zheng-xq what's the story on distributed performance across multiple workers in tf_cnn_benchmarks?
As for tf.contrib.distribute, that is the recommended way of distributing across multiple GPUs or workers, as it is easy to use. The only reason tf_cnn_benchmarks does not use tf.contrib.distribute is to allow us to easily test new strategies for increasing performance, either with multi-GPU or multi-worker, without having to modify tf.contrib.distribute itself.
@ppwwyyxx @reedwm Does tf_cnn_benchmarks slow down the training progress if ignoring the shift_ratio? Because the workers may train the same small part of images occasionally. By the way, when will you fix this issue? Thank you.