gpushare-scheduler-extender Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

Open Pallavi-code opened this issue 5 years ago • 4 comments

Thanks for your device plugin. I have installed gpu share device plugin. But when I am creating the sample binpack-1-0 is in pending state.

kubectl decribe pod binpack1-0 Events: Type Reason Age From Message

Warning FailedScheduling 44s (x2 over 44s) default-scheduler Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

root@kmaster:~# kubectl-inspect-gpushare NAME IPADDRESS GPU Memory(GiB) node2 192.168.193.239 0/8

Allocated/Total GPU Memory In Cluster: 0/8 (0%)

I am able to see the above result for gpu share. Please help me to fix this issue. Thanks

Apr 08 '19 10:04 Pallavi-code

Could you please check the output :

kubectl get po -n kube-system  | grep gpushare

Apr 08 '19 11:04 cheyang

root@kmaster:~# kubectl get po -n kube-system | grep gpushare gpushare-device-plugin-ds-fh6bq 1/1 Running 1 3d18h gpushare-schd-extender-6784448b67-w4bhv 1/1 Running 0 19h

Apr 09 '19 05:04 Pallavi-code

Could you please check the output :
kubectl get po -n kube-system  | grep gpushare

HI, I am facing exactly the same issue and here is the result

gpushare-device-plugin-ds-tb87g 1/1 Running 0 2d gpushare-schd-extender-989c6c7fc-n29bp 0/1 Pending 0 36m

Any idea how to fix this issue? Thank you.

Jun 03 '19 16:06 WBQ1995

Is this problem solved? Which version of kubernetes were you using?

I had faced this issue before because I was using IPVS instead of iptables in kube-proxy, and it doesn't work well with NodePort, not routing the localhost request to the gpushare-scheduler pod when it's running on another node, even if the source node is listening on the service port (32766 in this case).

So, the request will only suceed if kube-scheduler and gpushared-scheduler are both running on the same node.

This issue show more details about this problem.

I'm running gpushare-scheduler as a daemonset in all master nodes so it can work properly.

Another workaround would be disable IPVS and use iptables.

Oct 05 '20 02:10 igorcezar

gpushare-scheduler-extender gpushare-scheduler-extender copied to clipboard

Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused

root@kmaster:~# kubectl-inspect-gpushare NAME IPADDRESS GPU Memory(GiB) node2 192.168.193.239 0/8

gpushare-scheduler-extender
gpushare-scheduler-extender copied to clipboard