wx1111

Results 3 issues of wx1111

hi, I run benchmark on my cluster with model VGG-11、16、19 in distributed mode(1 ps and 4 worker). The accuracy do not increase. the optimizer settings are: optimizer : rmsprop init_learning_rate...

How benchmark achieves data parallelism in distributed training. It seems that there is no tf.dataset.shard in the code data_utils.py. How to guarantee that different workers get different data at the...

I started a distributed train with 16 worker (4 gpus per worker) and the worker0 appeared to hang after print "Running warm up" . I checked all the worker, the...