Volcano Uneven scheduling
Please describe your problem in detail
my k8s node is as follows:
NAME STATUS ROLES AGE VERSION
prod.ds.03.idc Ready control-plane,master 350d v1.21.9
prod.ds.04.idc Ready control-plane,master 350d v1.21.9
prod.ds.05.idc Ready control-plane,master 350d v1.21.9
prod.ds.06.idc Ready <none> 350d v1.21.9
prod.ds.07.idc Ready <none> 350d v1.21.9
prod.ds.08.idc Ready <none> 350d v1.21.9
prod.ds.12.idc Ready <none> 221d v1.21.9
prod.ds.13.idc Ready <none> 221d v1.21.9
prod.ds.14.idc Ready <none> 221d v1.21.9
each node is 16c 64g, the master setting is not schedulable, and the label node=realtime is set for the three nodes prod.ds.12.idc、prod.ds.12.idc、prod.ds.14.idc
kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
prod.ds.03.idc Ready control-plane,master 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.03.idc,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
prod.ds.04.idc Ready control-plane,master 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.04.idc,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
prod.ds.05.idc Ready control-plane,master 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.05.idc,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
prod.ds.06.idc Ready <none> 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.06.idc,kubernetes.io/os=linux,node=offline
prod.ds.07.idc Ready <none> 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.07.idc,kubernetes.io/os=linux,node=offline
prod.ds.08.idc Ready <none> 350d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.08.idc,kubernetes.io/os=linux,node=offline
prod.ds.12.idc Ready <none> 221d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.12.idc,kubernetes.io/os=linux,node=realtime
prod.ds.13.idc Ready <none> 221d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.13.idc,kubernetes.io/os=linux,node=realtime
prod.ds.14.idc Ready <none> 221d v1.21.9 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=prod.ds.14.idc,kubernetes.io/os=linux,node=realtime
I created a queue:
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: realtime
spec:
capability:
cpu: "48"
memory: 192Gi
weight: 1
I try to use scheduled tasks to trigger many Kubernetes Deployments at the same time, so that they can be deployed and run in the realtime queue. however, it was found that the pods were not evenly distributed on the three nodes.
$kubectl get pod -o wide | grep prod.ds.12.idc | wc -l
27
$kubectl get pod -o wide | grep prod.ds.13.idc | wc -l
22
$kubectl get pod -o wide | grep prod.ds.14.idc | wc -l
5
This is the Deployment yaml:
apiVersion: apps/v1
kind: Deployment
...
spec:
selector:
matchLabels:
app: flinksync-4pi1nof99gi3-63o0fd9ocmyr78ch
component: jobmanager
type: flink-native-kubernetes
template:
metadata:
annotations:
scheduling.k8s.io/group-name: pg-flinksync-4pi1nof99gi3-63o0fd9ocmyr78ch
labels:
app: flinksync-4pi1nof99gi3-63o0fd9ocmyr78ch
component: jobmanager
query-tag: flinksync-4pi1nof99gi3
type: flink-native-kubernetes
name: pod-template
spec:
...
containers:
nodeSelector:
node: realtime
schedulerName: volcano
...
This is the volcano scheduler conf:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
This scheduling puts a lot of pressure on two nodes, while some nodes are idle. How can we make the scheduling more balanced?
Any other relevant information
Kubernetes version: v1.21.9 os: CentOS Linux release 7.9.2009 (Core)
because binpack algorithm. https://volcano.sh/en/docs/plugins/#binpack
because binpack algorithm. https://volcano.sh/en/docs/plugins/#binpack
So if I delete the binpack configuration in volcano schedule conf, can the scheduling be more uniform on each node?
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack ## delete this
yes, you are right
yes, you are right
thank you
Have your problem been solved?