gpushare-scheduler-extender
gpushare-scheduler-extender copied to clipboard
Adapting for use with managed control plane
I have an EKS cluster and am hoping to adapt this to run as a second scheduler since I can't edit the default kube-scheduler as called for in your installation instructions (I don't believe, but correct me if I am wrong).
I have edited the yaml slightly to be in line with the guide at the below link: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
But it doesn't seem to be working (will admit I knew this was wishful thinking). Any ideas what else I need to do? I am very new to go so struggling to dig into the source code.
kind: ServiceAccount
apiVersion: v1
metadata:
name: gpu-scheduler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpu-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
name: gpu-scheduler
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:kube-scheduler
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1 #extensions/v1beta1
kind: Deployment
metadata:
labels:
component: scheduler
tier: control-plane
name: gpu-scheduler
namespace: kube-system
spec:
selector:
matchLabels:
component: scheduler
tier: control-plane
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
component: scheduler
tier: control-plane
spec:
serviceAccountName: gpu-scheduler
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpu-scheduler:1.11-d170d8a
name: gpu-scheduler
env:
- name: LOG_LEVEL
value: debug
- name: PORT
value: "12345"
hostNetwork: true
tolerations:
- effect: NoSchedule
operator: Exists
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: Exists
key: node.cloudprovider.kubernetes.io/uninitialized
nodeSelector:
node-role.kubernetes.io/master: ""
I need to implement a full second scheduler don't I...
I think it can work only when Kubernetes default scheduler can be configured.
@cheyang , why?
@tlives , any success here?
@ide8 afraid not, we decided on a different setup. I did see this but haven't tried it: https://github.com/Deepomatic/shared-gpu-nvidia-k8s-device-plugin
The reason it won't work is that in a managed service you don't have access to the scheduler config to modify it (see installation instructions, this is a requirement).
Any updates on this? I was hoping by running a second scheduler we could simply call out to the second scheduler and apply extenders to it.
@tlives any updates from your end on this?