lanyangyang
lanyangyang
It seems the [Suggestion settings](https://www.kubeflow.org/docs/components/katib/katib-config/#suggestion-settings) with configmap don't support `imagePullSecrets` and other similar spec settings in Deployment It will probably be more flexible for user to set deployment attributes in...
@paravmellanox I saw your reply in this issue [https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/issues/16#issuecomment-434556494](url) _if you are using shared mode of this device plugin, and intent to use overlay network such as calico, contiv etc...
@ymjiang sorry for pasting incorrectlly.
@ymjiang I don't use nvidia-docker to launch because I set docker default runtime nvidia on GPU node. and I can access gpu in the docker here is docker info data:image/s3,"s3://crabby-images/b5934/b5934ab98ba6273836366434c0d0bda8907c5f46" alt="image"...
`10.94.1.77` is the host ip of scheduler, server and scheduler are on the same node. here is log when set `PS_VERBOSE=2` scheduler: ``` [root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server...
@ymjiang it seems still hang. scheduler: ``` [root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server bash root@--0002:~# export DMLC_NUM_WORKER=1 root@--0002:~# export DMLC_ROLE=scheduler root@--0002:~# export DMLC_NUM_SERVER=1 root@--0002:~# export DMLC_PS_ROOT_URI=10.94.1.77 root@--0002:~# export DMLC_PS_ROOT_PORT=1234...
@ymjiang @bobzhuyb it workes well when worker/server/scheduler on the same machine with 127.0.0.1 as DMLC_PS_ROOT_URI. let's focus on network. so I try this: on first node 10.94.1.77 scheduler ``` [root@--0002...
@ymjiang what do you mean scheduler prevent others from connecting to the port?? I telnet from worker to scheduler. it shows success. ``` root@-0001:~# telnet 10.94.1.77 1234 Trying 10.94.1.77... Connected...
@ymjiang ok, I will try later. I wonder what network issue will be if I aready can telnet from worker to scheduler with the SCHEDULER_IP and SCHEDULER_PORT ?
@bobzhuyb @ymjiang here is the log from scheduler ``` BytePS launching scheduler [03:23:00] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024 [03:23:00] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4 [03:23:00] src/van.cc:357: Bind to role=scheduler, id=1,...