Allocated status of queue is wrong, when deploy multi volcano scheduler with StatefulSet
What happened: We deploy 3 volcano scheduler with StatefulSet, referring to the following documents: https://github.com/volcano-sh/volcano/blob/master/docs/design/deploy-multi-volcano-schedulers-without-using-selector.md
kubectl get pod -n volcano-system|grep volcano-scheduler
volcano-scheduler-0 1/1 Running 0 20h
volcano-scheduler-1 1/1 Running 0 20h
volcano-scheduler-2 1/1 Running 0 20h
we submit 5 jobs into same queue, each job has 2 tasks(each task requests 1 cpu),so the allocated cpu of the queue will be 10. But the output of queue status is unexpected.
kubectl get vj -n ns-root|grep Running
job-169536577254725355276-root Running 2 2 14m
job-169536577471915548993-root Running 2 2 14m
job-169536577618095753595-root Running 2 2 14m
job-169536577780397649277-root Running 2 2 14m
job-169536577946553449829-root Running 2 2 14m
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}}
creationTimestamp: "2023-09-21T09:18:35Z"
generation: 3
name: pdcpu
resourceVersion: "266775645"
uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c
spec:
reclaimable: false
weight: 1
status:
allocated:
cpu: "4"
memory: 4Gi
nvidia.com/gpu: "0"
reservation: {}
running: 5
state: Open
What you expected to happen: Get queue status by cmd several times:
kubectl get queue pdcpu -oyaml
The expected result:
Every output of execution: allocated: cpu: "10"
The actual result is:
output of the 1st execution: allocated: cpu: "2" output of the 2nd execution: allocated: cpu: "4" output of the 3rd execution: allocated: cpu: "4"
but, the running status is right on each execution: running: 5
How to reproduce it (as minimally and precisely as possible): 1. setup scheduler by the config
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: volcano-scheduler
namespace: volcano-system
labels:
app: volcano-scheduler
spec:
replicas: 3
selector:
matchLabels:
app: volcano-scheduler
serviceName: "volcano-scheduler"
template:
metadata:
labels:
app: volcano-scheduler
spec:
serviceAccount: volcano-scheduler
containers:
- name: volcano-scheduler
args:
- --logtostderr
- --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
- --enable-healthz=true
- --enable-metrics=true
- --leader-elect=false
- -v=3
- 2>&1
image: volcanosh/vc-scheduler:v1.8.0
imagePullPolicy: IfNotPresent
env:
- name: MULTI_SCHEDULER_ENABLE
value: "true"
- name: SCHEDULER_NUM
value: "3"
- name: SCHEDULER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: scheduler-config
mountPath: /volcano.scheduler
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
serviceAccount: volcano-scheduler
volumes:
- name: scheduler-config
configMap:
name: volcano-scheduler-configmap
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
apiVersion: v1
kind: Service
metadata:
name: volcano-scheduler
labels:
app: volcano-scheduler
spec:
ports:
- port: 80
name: volcano-scheduler
clusterIP: None
selector:
app: volcano-scheduler
2. setup a queue by the config
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"scheduling.volcano.sh/v1beta1","kind":"Queue","metadata":{"annotations":{},"name":"pdcpu"},"spec":{"reclaimable":false,"weight":1}}
creationTimestamp: "2023-09-21T09:18:35Z"
generation: 3
name: pdcpu
resourceVersion: "266780079"
uid: 5f9a7e8c-91dc-4481-b8ad-927b11806d6c
spec:
reclaimable: false
weight: 1
3. submit some jobs request same CPU to the same queue 4. get the queue status by kubectl get queue
Anything else we need to know?:
Environment:
- Volcano Version: 1.8
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
- OS (e.g. from /etc/os-release): CentOS Linux 7 (Core)
- Kernel (e.g.
uname -a):
Linux d2-hpc-master-01 4.18.16-1.el7.elrepo.x86_64 #1 SMP Sat Oct 20 12:52:50 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
The doc you mentioned has some problem currently, you can see https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-volcano-schedulers.md if you want to use multi schedulers: )