ecosystem
ecosystem copied to clipboard
Distributed training is running on single worker
I am running the docker mnist example using distributed training with kubernetes template project. I had created train.tfrecords and stored as volume for all the images under /tmp/data. All the 3 nodes come up fine. But training always start on one server. I don't see the second worker do any computation. Do we need to change anything to this to work?