volcano
volcano copied to clipboard
Can not apply a vcjob with volcano-release-1.6
What happened:
I uninstall volcano-1.5.1 by kubectl delete -f ./volcano-1.5.1/volcano-development.yaml
, and reinstall volcano-release-1.6 by kubectl apply -f ./volcano-release-1.6/volcano-development.yaml
. When I apply a vcjob reference the step 2 of https://volcano.sh/en/docs/tutorials/
, kubectl get node
output No resources found in default namespace
, and the vcjob status is pending as fllow.
apiVersion: v1
items:
- apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"batch.volcano.sh/v1alpha1","kind":"Job","metadata":{"annotations":{},"name":"job-1","namespace":"default"},"spec":{"minAvailable":1,"policies":[{"action":"RestartJob","event":"PodEvicted"}],"queue":"test","schedulerName":"volcano","tasks":[{"name":"nginx","policies":[{"action":"CompleteJob","event":"TaskCompleted"}],"replicas":1,"template":{"spec":{"containers":[{"command":["sleep","10m"],"image":"nginx:latest","name":"nginx","resources":{"limits":{"cpu":1},"requests":{"cpu":1}}}],"restartPolicy":"Never"}}}]}}
creationTimestamp: "2022-06-13T10:26:42Z"
generation: 1
name: job-1
namespace: default
resourceVersion: "6854386"
uid: 16b729d3-085d-4747-86b6-0ceb614b906e
spec:
maxRetry: 3
minAvailable: 1
policies:
- action: RestartJob
event: PodEvicted
queue: test
schedulerName: volcano
tasks:
- maxRetry: 3
minAvailable: 1
name: nginx
policies:
- action: CompleteJob
event: TaskCompleted
replicas: 1
template:
metadata: {}
spec:
containers:
- command:
- sleep
- 10m
image: nginx:latest
name: nginx
resources:
limits:
cpu: "1"
requests:
cpu: "1"
restartPolicy: Never
status:
conditions:
- lastTransitionTime: "2022-06-13T10:26:44Z"
status: Pending
minAvailable: 1
state:
lastTransitionTime: "2022-06-13T10:26:44Z"
phase: Pending
kind: List
metadata:
resourceVersion: ""
selfLink: ""
What you expected to happen: this vcjob can run normally as voclano-1.5.1.
How to reproduce it (as minimally and precisely as possible):
- install volcano-release-1.6.
- apply a vcjob as step 2 of
https://volcano.sh/en/docs/tutorials/
.
Anything else we need to know?: Is it related to uninstall volcano-1.5.1?
Environment:
- Volcano Version: 1.6
- Kubernetes version (use
kubectl version
): 1.21.3 - Cloud provider or hardware configuration: server machines with 4 Nvidia V100
- OS (e.g. from /etc/os-release): CentOS Linux release 7.6.1810 (Core)
- Kernel (e.g.
uname -a
): 3.10.0-957.5.1.el7.x86_64 - Install tools: kubectl apply -f volcano-development.yaml
- Others:
/assign @Thor-wl Please help to take a look.
Please check the status of volcano components
kubectl get po -n volcano-system
If possible, please attach the logs of the volcano components
any progress for this issue? Can we reproduce it?
@kongjibai Can you give more details about the scenario? For example, please execute kubectl describe vcjob job-1
to see the status and the corresponding podgroup status.
Please check the status of volcano components
kubectl get po -n volcano-system
If possible, please attach the logs of the volcano components
sorry, it's a long time no reply. it outputs as below
NAME READY STATUS RESTARTS AGE
volcano-admission-6c68cbbf98-s6twp 1/1 Running 0 8m14s
volcano-admission-init-nnbfq 0/1 Completed 0 8m14s
volcano-controllers-f4b69577b-99cfp 1/1 Running 0 8m14s
volcano-scheduler-c98cb745b-kqgpr 1/1 Running 0 8m14s
kubectl describe vcjob job-1
sorry, it's a long time no reply. it outputs as below, reminds pod group is not ready. it's normal in volcano-release-1.5, but failed in volcano-release-1.6. How can I sovle this problem?
Name: job-1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: batch.volcano.sh/v1alpha1
Kind: Job
Metadata:
Creation Timestamp: 2022-08-16T09:30:04Z
Generation: 1
Managed Fields:
API Version: batch.volcano.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
f:minAvailable:
f:state:
.:
f:lastTransitionTime:
f:phase:
Manager: Go-http-client
Operation: Update
Time: 2022-08-16T09:30:04Z
API Version: batch.volcano.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:minAvailable:
f:policies:
f:queue:
f:schedulerName:
f:tasks:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-08-16T09:30:04Z
Resource Version: 5063533
UID: 2cadba62-3f14-4c59-b09b-5bfcb8cce0d5
Spec:
Max Retry: 3
Min Available: 1
Policies:
Action: RestartJob
Event: PodEvicted
Queue: test
Scheduler Name: volcano
Tasks:
Max Retry: 3
Min Available: 1
Name: nginx
Policies:
Action: CompleteJob
Event: TaskCompleted
Replicas: 1
Template:
Metadata:
Spec:
Containers:
Command:
sleep
10m
Image: nginx:latest
Name: nginx
Resources:
Limits:
Cpu: 1
Requests:
Cpu: 1
Restart Policy: Never
Status:
Conditions:
Last Transition Time: 2022-08-16T09:30:10Z
Status: Pending
Min Available: 1
State:
Last Transition Time: 2022-08-16T09:30:10Z
Phase: Pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning PodGroupPending 2m6s vc-controller-manager PodGroup default:job-1 unschedule,reason: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
This output provides little useful information for debug. Have you described the podgroup for more details or take a search of the logs?
This output provides little useful information for debug. Have you described the podgroup for more details or take a search of the logs?
the podgroup described as below, it reminds NotEnoughResources
, but i'm sure the k8s cluster has enought resource, including cpu, memory and gpu. because everything is ok in volcano-release-1.5.
Name: job-1-2cadba62-3f14-4c59-b09b-5bfcb8cce0d5
Namespace: default
Labels: <none>
Annotations: <none>
API Version: scheduling.volcano.sh/v1beta1
Kind: PodGroup
Metadata:
Creation Timestamp: 2022-08-16T09:30:04Z
Generation: 955
Managed Fields:
API Version: scheduling.volcano.sh/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:ownerReferences:
.:
k:{"uid":"2cadba62-3f14-4c59-b09b-5bfcb8cce0d5"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:spec:
.:
f:minMember:
f:minResources:
.:
f:count/pods:
f:cpu:
f:limits.cpu:
f:pods:
f:requests.cpu:
f:minTaskMember:
.:
f:nginx:
f:queue:
f:status:
Manager: Go-http-client
Operation: Update
Time: 2022-08-16T09:30:04Z
API Version: scheduling.volcano.sh/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:phase:
Manager: vc-scheduler
Operation: Update
Time: 2022-08-16T09:30:05Z
Owner References:
API Version: batch.volcano.sh/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: Job
Name: job-1
UID: 2cadba62-3f14-4c59-b09b-5bfcb8cce0d5
Resource Version: 5159589
UID: 95e8e0c3-12c1-4455-9edd-aa4e298800d9
Spec:
Min Member: 1
Min Resources:
count/pods: 1
Cpu: 1
limits.cpu: 1
Pods: 1
requests.cpu: 1
Min Task Member:
Nginx: 1
Queue: test
Status:
Conditions:
Last Transition Time: 2022-08-17T03:00:41Z
Message: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
Reason: NotEnoughResources
Status: True
Transition ID: 97f14a76-f54d-441d-992d-7b732621e7a3
Type: Unschedulable
Phase: Pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unschedulable 58s (x62798 over 17h) volcano 0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable