wx1111 issues

Repositories
Issues
Comments

Results 3 issues of


                                            wx1111

model VGG-11、16、19 accuracy do not increase

hi, I run benchmark on my cluster with model VGG-11、16、19 in distributed mode(1 ps and 4 worker). The accuracy do not increase. the optimizer settings are: optimizer : rmsprop init_learning_rate...

data parallelism about benchmark distributed training

How benchmark achieves data parallelism in distributed training. It seems that there is no tf.dataset.shard in the code data_utils.py. How to guarantee that different workers get different data at the...

benchmark distributed train with 16 workers hangs with error： Too many pings

I started a distributed train with 16 worker (4 gpus per worker) and the worker0 appeared to hang after print "Running warm up" . I checked all the worker, the...