yefeng

Results 1 issues of yefeng

使用两个容器进行2机2卡实验,报错如下,希望可以帮忙解决一下 ### 环境: 基于nvcr.io/nvidia/tensorflow:21.12-tf1-py3构建的容器 ### 脚本: FastNN的resnet脚本 ### 启动命令 ``` TF_CONFIG='{"cluster":{"worker":["192.168.83.228:6666","192.168.83.228:6667"]},"task":{"type":"worker","index":0}}' bash scripts/train_dp.sh TF_CONFIG='{"cluster":{"worker":["192.168.83.228:6666","192.168.83.228:6667"]},"task":{"type":"worker","index":0}}' bash scripts/train_dp.sh ``` ### 报错 ``` 2023-08-31 01:40:46.786721: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at nccl_communicator.cc:116 : Internal:...