gpushare-scheduler-extender
gpushare-scheduler-extender copied to clipboard
Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused
Thanks for your device plugin. I have installed gpu share device plugin. But when I am creating the sample binpack-1-0 is in pending state.
kubectl decribe pod binpack1-0 Events: Type Reason Age From Message
Warning FailedScheduling 44s (x2 over 44s) default-scheduler Post http://127.0.0.1:32766/gpushare-scheduler/filter: dial tcp 127.0.0.1:32766: connect: connection refused
root@kmaster:~# kubectl-inspect-gpushare NAME IPADDRESS GPU Memory(GiB) node2 192.168.193.239 0/8
Allocated/Total GPU Memory In Cluster: 0/8 (0%)
I am able to see the above result for gpu share. Please help me to fix this issue. Thanks
Could you please check the output :
kubectl get po -n kube-system | grep gpushare
root@kmaster:~# kubectl get po -n kube-system | grep gpushare gpushare-device-plugin-ds-fh6bq 1/1 Running 1 3d18h gpushare-schd-extender-6784448b67-w4bhv 1/1 Running 0 19h
Could you please check the output :
kubectl get po -n kube-system | grep gpushare
HI, I am facing exactly the same issue and here is the result
gpushare-device-plugin-ds-tb87g 1/1 Running 0 2d gpushare-schd-extender-989c6c7fc-n29bp 0/1 Pending 0 36m
Any idea how to fix this issue? Thank you.
Is this problem solved? Which version of kubernetes were you using?
I had faced this issue before because I was using IPVS instead of iptables in kube-proxy, and it doesn't work well with NodePort, not routing the localhost request to the gpushare-scheduler pod when it's running on another node, even if the source node is listening on the service port (32766 in this case).
So, the request will only suceed if kube-scheduler and gpushared-scheduler are both running on the same node.
This issue show more details about this problem.
I'm running gpushare-scheduler as a daemonset in all master nodes so it can work properly.
Another workaround would be disable IPVS and use iptables.