distributeTensorflowExample icon indicating copy to clipboard operation
distributeTensorflowExample copied to clipboard

同步数据无法通信

Open weihualiuhupituzi opened this issue 6 years ago • 1 comments

楼主thewintersun您好, 我做分布式并行测试您的代码的时候,指定了一台电脑为ps,服务器上的两个GPU卡分别为两个worker; 一台worker地址设置如下: --ps_hosts=192.168.4.227:2230 --worker_hosts=192.168.4.25:2224,192.168.4.25:2225

--job_name=worker

--task_index=0 然后出现不停的等待对方worker响应,然而没有开始执行计算: 2018-12-13 12:02:12.073144: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0

2018-12-13 12:02:29.539926: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1 请教一下,这个是哪方面没有注意到呢?非常感谢! weihua Liu

weihualiuhupituzi avatar Dec 13 '18 12:12 weihualiuhupituzi

首先ps的服务器和worker的服务器网络端口通吗?

然后每个节点都正常启动了吗?

启动了之后大概要等半分钟的。

thewintersun avatar Dec 18 '18 07:12 thewintersun