ecosystem icon indicating copy to clipboard operation
ecosystem copied to clipboard

Distributed training is running on single worker

Open asheeshgarg opened this issue 5 years ago • 0 comments

I am running the docker mnist example using distributed training with kubernetes template project. I had created train.tfrecords and stored as volume for all the images under /tmp/data. All the 3 nodes come up fine. But training always start on one server. I don't see the second worker do any computation. Do we need to change anything to this to work?

asheeshgarg avatar Apr 01 '19 19:04 asheeshgarg