tf_repos icon indicating copy to clipboard operation
tf_repos copied to clipboard

tf distribute

Open zhangyingxia opened this issue 7 years ago • 3 comments

when i run the deepfm in the distribute mode, an error happened: No worker known as /job:chief/replica:0/task:0 could you help me~

zhangyingxia avatar Oct 16 '18 08:10 zhangyingxia

run_dist.sh?

lambdaji avatar Oct 16 '18 09:10 lambdaji

你好,我是用deepfm.py的框架,设置了分布式的TF_CONFIG, 接着启动分布式训练的时候报错:No worker known as /job:chief/replica:0/task:0。但是之前已经启动成功chief,你是否遇见过这类错误呢~

zhangyingxia avatar Oct 18 '18 09:10 zhangyingxia

启动脚本发来看看

lambdaji avatar Oct 19 '18 02:10 lambdaji