gpushare-scheduler-extender icon indicating copy to clipboard operation
gpushare-scheduler-extender copied to clipboard

Fail to create kube-scheduler

Open ylhsiehitri opened this issue 6 years ago • 5 comments

Hi,

I tried to create the kube-scheduler with kubectl create -f https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/config/kube-scheduler.yaml, but failed.

$ kubectl get pods -n kube-system
NAME                                     READY   STATUS
kube-scheduler                           0/1     CrashLoopBackOff
kube-scheduler-leadtek-gs4820            1/1     Running
gpushare-device-plugin-ds-hs4kt          1/1     Running
gpushare-schd-extender-978bd945b-sqhzj   1/1     Running
...
$ kubectl logs -n kube-system kube-scheduler
failed to create listener: failed to listen on 127.0.0.1:10251: listen tcp 127.0.0.1:10251: bind: address already in use

Even if I remove the livenessProbe section in the aforementioned kube-scheduler.yaml, the kubectl logs still shows the same error.

What's going wrong...?

Thanks!

ylhsiehitri avatar Aug 29 '19 15:08 ylhsiehitri

It seems I misunderstood... According to the section 2 in installation guide [1], it means to add those things in 2.1 and 2.2 to the default kube-scheduler and replace the default kube-scheduler, right?

If yes, I tried to modify /etc/kubernetes/manifests/kube-scheduler.yaml (then "$kubectl get po -n kube-system" will see the pod "kube-scheduler-leadtek-gs4820" restarts. Here "leadtek-gs4820" is the node name), but then I examine the running config (by "kubectl edit po -n kube-system kube-scheduler-leadtek-gs4820"), there's no update.

[1] https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

ylhsiehitri avatar Sep 06 '19 09:09 ylhsiehitri

Hello, it is indeed to modify the default kube-scheduler file, but I deployed the GPU shared pod after the configuration is completed, found no effect, then I tried to restart the machine, to achieve the desired effect

Vae1997 avatar Oct 15 '19 06:10 Vae1997

@Vae1997 , why can't we create one more scheduler instead of modifying the default one?

ide8 avatar Oct 16 '19 17:10 ide8

@ide8 Hello, I am just a beginner, the specific principle is not very clear, so I can only say my thoughts:

If you modify the default scheduler, k8s should be re-introduced by internal related mechanisms without affecting the overall environment of the cluster. Deploy the modified scheduler so that your changes take effect.

Conversely, if you deploy a scheduler yourself, you may conflict with the default scheduling mechanism. On the other hand, your changes will not take effect. The worst case is that the original scheduling mechanism of the cluster will not work.

Of course, there should be a way to delete the default scheduler and redeploy as needed. It should be the same as when building a cluster with binary files. (But I haven't tried to build a k8s cluster from a binary file. It's not clear if this operation will affect the existing cluster.)

Vae1997 avatar Oct 17 '19 08:10 Vae1997

Hello, it is indeed to modify the default kube-scheduler file, but I deployed the GPU shared pod after the configuration is completed, found no effect, then I tried to restart the machine, to achieve the desired effect

Similarly to @Vae1997 that restarting the machine, for some reason I happened to re-install OS, then the problem just disappeared...

ylhsiehitri avatar Nov 04 '19 09:11 ylhsiehitri