volcano
volcano copied to clipboard
imagelocality.weight can not take affect when using nodeorder plugin
What happened: imagelocality.weight can not take affect when using nodeorder plugin What you expected to happen: imagelocality.weight works when using nodeorder plugin How to reproduce it (as minimally and precisely as possible):
- there are 3 worker nodes in my environment
[root@host-10-19-37-28 volcano]# kubectl get node
NAME STATUS ROLES AGE VERSION
host-10-19-37-27 Ready <none> 142d v1.22.2
host-10-19-37-28 Ready control-plane,master 147d v1.22.2
host-10-19-37-29 Ready <none> 147d v1.22.2
host-10-19-37-34 Ready <none> 145d v1.22.2
- job
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: vc-gzfjob1
namespace: test
spec:
# minAvailable: 0
schedulerName: volcano
queue: test
priorityClassName: high-priority
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 3
name: gzfjob1
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
priorityClassName: high-priority
containers:
- command:
- sleep
- 10m
image: nginx:latest
name: nginx
resources:
requests:
cpu: 200m
limits:
cpu: 200m
restartPolicy: OnFailure
- only host-10-19-37-34 already has
nginx:latest
image - scheduler config:
- name: nodeorder
arguments:
nodeaffinity.weight: 0
podaffinity.weight: 0
leastrequested.weight: 0
balancedresource.weight: 0
mostrequested.weight: 0
tainttoleration.weight: 0
imagelocality.weight: 100
- deploy the job
[root@host-10-19-37-28 volcano]# kubectl create -f ./queuejob.yaml
job.batch.volcano.sh/vc-gzfjob1 created
[root@host-10-19-37-28 volcano]# kubectl -n test get po
NAME READY STATUS RESTARTS AGE
vc-gzfjob1-gzfjob1-0 0/1 ContainerCreating 0 8s
vc-gzfjob1-gzfjob1-1 0/1 ContainerCreating 0 8s
vc-gzfjob1-gzfjob1-2 0/1 ContainerCreating 0 8s
[root@host-10-19-37-28 volcano]# kubectl -n test get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vc-gzfjob1-gzfjob1-0 0/1 ContainerCreating 0 12s <none> host-10-19-37-27 <none> <none>
vc-gzfjob1-gzfjob1-1 0/1 ContainerCreating 0 12s <none> host-10-19-37-27 <none> <none>
vc-gzfjob1-gzfjob1-2 0/1 ContainerCreating 0 12s <none> host-10-19-37-27 <none> <none>
It choose node host-10-19-37-27 instead.
Anything else we need to know?:
Environment:
- Volcano Version:1.16
- Kubernetes version (use
kubectl version
):1.22 - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release): centos 7.5
- Kernel (e.g.
uname -a
): 3.10.0-957.27.2.el7.x86_64 - Install tools: kubeadmin
- Others:
Add the debug info into session.go
snapshot := cache.Snapshot()
klog.Warningf("3333333333 argument: %v", snapshot.Nodes)
check the output of session cache
W0919 10:47:32.894246 1 session.go:142] 3333333333 argument: map[host-10-19-37-27:Node (host-10-19-37-27): allocatable<cpu 8000.00, memory 33631535104.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00> idle <cpu 6900.00, memory 32348078080.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 1100.00, memory 1283457024.00>, releasing <cpu 0.00, memory 0.00>, oversubscribution <cpu 0.00, memory 0.00>, state <phase Ready, reaseon >, oversubscributionNode <false>, offlineJobEvicting <false>,taints <[]>
0: Task (4b7873f0-d37a-4386-97ab-76daf0a21692:volcano-system/volcano-scheduler-796fbd96b9-pbqmh): job , status Running, pri 2000000000resreq cpu 0.00, memory 0.00, preemptable false, revocableZone , numaInfo { map[]}
1: Task (8509439a-5694-4c2d-8c39-2549b59e053f:hive-instance-hive1/metastore-server-66f77bd7-6jdkq): job , status Running, pri 1000000resreq cpu 1000.00, memory 1073741824.00, preemptable false, revocableZone , numaInfo { map[]}
2: Task (42235660-1d07-4065-b9d0-e1cc03d1f517:kube-system/kube-proxy-w4l4h): job , status Running, pri 2000001000resreq cpu 0.00, memory 0.00, preemptable false, revocableZone , numaInfo { map[]}
3: Task (c848bfca-523c-421a-8c22-42ad6fc46a44:kube-system/weave-net-qfhgt): job , status Running, pri 2000001000resreq cpu 100.00, memory 209715200.00, preemptable false, revocableZone , numaInfo { map[]} host-10-19-37-28:Node (host-10-19-37-28): allocatable<cpu 8000.00, memory 33631531008.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00> idle <cpu 7050.00, memory 33170157568.00, hugepages-2Mi 0.00, hugepages-1Gi 0.00>, used <cpu 950.00, memory 461373440.00>, releasing <cpu 0.00, memory 0.00>, oversubscribution <cpu 0.00, memory 0.00>, state <phase Ready, reaseon >, oversubscributionNode <false>, offlineJobEvicting <false>,taints <[{node-role.kubernetes.io/master NoSchedule <nil>}]>
0: Task (5966159e-3b2d-44e2-a87a-5974b70d9cf7:kube-system/kube-apiserver-host-10-19-37-28): job , status Running, pri 2000001000resreq cpu 250.00, memory 0.00, preemptable false, revocableZone , numaInfo { map[]}
It looks like the images on the node is not included in the cache
@wangyang0616 please take a look at this issue :)
According to the method provided by @zhifanggao, the imagelocality policy does not take effect.
When the default scheduler of kube-scheduler is used, the imagelocality policy still does not take effect. It is suspected that the scoring logic of the imagelocality policy of K8S is incorrect.
An issue has been created in the K8S community. https://github.com/kubernetes/kubernetes/issues/112699
The cause of the problem is found. The image name configured in the YAML file does not match the image name recorded on the node.
For example, if the name of the image on the node is docker.io/library/nginx:latest
, the image in the YAML file is nginx:latest
, and the K8S does not implement intelligent matching of the image prefix name. When the K8S performs scheduling scoring, The system considers that the nginx
image does not exist on all work nodes. As a result, the imagelocality policy becomes invalid.
You can try the following methods to solve the problem:
Run the docker images
command to query the local image name, change the image name in the YAML file to be the same as the image name of the node, and perform scheduling again.
fix PR: https://github.com/volcano-sh/volcano/pull/2512
the node info is included in (* Node) which got from apiserver , Then save into schedulercache-------> snapshot()--------->nodemap in nodeorder.go. The image information on nodes exists in (* Node) , But it lost in schedulercache, snapshot() and nodemap in nodeorder.go. So the score of imagelocality is always '0'
The solution is that saving the image information into schedulercache,snapshots, and nodemap in nodeorder.go.
new Pr https://github.com/volcano-sh/volcano/pull/2543