gpushare-scheduler-extender icon indicating copy to clipboard operation
gpushare-scheduler-extender copied to clipboard

Scheduling GPUs on GKE: No access to master node

Open sihrc opened this issue 5 years ago • 4 comments

Trying to deploy the scheduler on GKE, but the GKE cluster does not have the master node in the nodes we can reach.

No nodes match the master node label. It appears the master node is completely managed by GKE.

We get these logs in the scheduler:

I0518 23:57:53.520315       1 controller_utils.go:1025] Waiting for caches to sync for scheduler controller
I0518 23:57:53.620497       1 controller_utils.go:1032] Caches are synced for scheduler controller
I0518 23:57:53.620576       1 leaderelection.go:185] attempting to acquire leader lease  kube-system/kube-scheduler...

Is there any way around having to use the master node? The master node is accessible via kubernetes APIs.

sihrc avatar May 19 '19 00:05 sihrc

I think you can't do it because it depends on some configuration changes in kube-scheduler. You can open a ticket to GKE about this.

cheyang avatar May 21 '19 12:05 cheyang

I have opened a ticket. https://issuetracker.google.com/issues/133184667

If anyone else is having this same issue, please please star the issue and add your own use case to the comments in that issue. It'll encourage them to prioritize the issue.

sihrc avatar May 21 '19 16:05 sihrc

@cheyang Can this gpushare-scheduler be used stand alone without the existing kubernetes scheduler?

sihrc avatar Jun 18 '19 20:06 sihrc

It's using scheduler extender Mechanism and have to change default scheduler's configuration. I'm afraid it's a mustrequirement.

cheyang avatar Jun 19 '19 22:06 cheyang