lanyangyang

Results 13 comments of lanyangyang

It seems the [Suggestion settings](https://www.kubeflow.org/docs/components/katib/katib-config/#suggestion-settings) with configmap don't support `imagePullSecrets` and other similar spec settings in Deployment It will probably be more flexible for user to set deployment attributes in...

@paravmellanox I saw your reply in this issue [https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/issues/16#issuecomment-434556494](url) _if you are using shared mode of this device plugin, and intent to use overlay network such as calico, contiv etc...

@ymjiang sorry for pasting incorrectlly.

@ymjiang I don't use nvidia-docker to launch because I set docker default runtime nvidia on GPU node. and I can access gpu in the docker here is docker info ![image](https://user-images.githubusercontent.com/29671725/60568446-6f0d1500-9d9f-11e9-9272-b092a83dea6f.png)...

`10.94.1.77` is the host ip of scheduler, server and scheduler are on the same node. here is log when set `PS_VERBOSE=2` scheduler: ``` [root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server...

@ymjiang it seems still hang. scheduler: ``` [root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server bash root@--0002:~# export DMLC_NUM_WORKER=1 root@--0002:~# export DMLC_ROLE=scheduler root@--0002:~# export DMLC_NUM_SERVER=1 root@--0002:~# export DMLC_PS_ROOT_URI=10.94.1.77 root@--0002:~# export DMLC_PS_ROOT_PORT=1234...

@ymjiang @bobzhuyb it workes well when worker/server/scheduler on the same machine with 127.0.0.1 as DMLC_PS_ROOT_URI. let's focus on network. so I try this: on first node 10.94.1.77 scheduler ``` [root@--0002...

@ymjiang what do you mean scheduler prevent others from connecting to the port?? I telnet from worker to scheduler. it shows success. ``` root@-0001:~# telnet 10.94.1.77 1234 Trying 10.94.1.77... Connected...

@ymjiang ok, I will try later. I wonder what network issue will be if I aready can telnet from worker to scheduler with the SCHEDULER_IP and SCHEDULER_PORT ?

@bobzhuyb @ymjiang here is the log from scheduler ``` BytePS launching scheduler [03:23:00] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024 [03:23:00] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4 [03:23:00] src/van.cc:357: Bind to role=scheduler, id=1,...