benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

data parallelism about benchmark distributed training

Open wx1111 opened this issue 6 years ago • 5 comments

How benchmark achieves data parallelism in distributed training. It seems that there is no tf.dataset.shard in the code data_utils.py. How to guarantee that different workers get different data at the one step? Any help is appreciated.

wx1111 avatar Jul 06 '18 05:07 wx1111

There is no guarantee that different workers get different data in one step. Different workers get some random data in one step so they are usually different.

ppwwyyxx avatar Jul 06 '18 05:07 ppwwyyxx

What @ppwwyyxx said is correct. We do shuffle the data with a buffer size of 10,000, but it's likely that training is suboptimial because we ignore shift_ratio.

Unfortunately, we currently aren't actively working on distributed performance in tf_cnn_benchmarks, and this probably will not be fixed until we start.

reedwm avatar Jul 09 '18 18:07 reedwm

@reedwm Hope you go on distributed performance, now tf.contrib.distribute develop quickly, this repo should rush now

anpark avatar Jul 27 '18 14:07 anpark

@zheng-xq what's the story on distributed performance across multiple workers in tf_cnn_benchmarks?

As for tf.contrib.distribute, that is the recommended way of distributing across multiple GPUs or workers, as it is easy to use. The only reason tf_cnn_benchmarks does not use tf.contrib.distribute is to allow us to easily test new strategies for increasing performance, either with multi-GPU or multi-worker, without having to modify tf.contrib.distribute itself.

reedwm avatar Jul 27 '18 16:07 reedwm

@ppwwyyxx @reedwm Does tf_cnn_benchmarks slow down the training progress if ignoring the shift_ratio? Because the workers may train the same small part of images occasionally. By the way, when will you fix this issue? Thank you.

Hannah-xxl avatar Aug 13 '18 03:08 Hannah-xxl