kruise icon indicating copy to clipboard operation
kruise copied to clipboard

[BUG] BroadcastJob status abnormal

Open ying2025 opened this issue 6 months ago • 1 comments

What happened: BroadcastJob is running, and the pod already running, after activeDeadlineSeconds arrived the BroadcastJob type of job status is fail. But the pod has already been successfully run. apiVersion: apps.kruise.io/v1alpha1 kind: BroadcastJob metadata: creationTimestamp: "2024-08-15T07:03:30Z" generation: 1 name: 1723705410-preheat-image resourceVersion: "180784897" uid: e6aacebb-2630-4dc8-8292-0a3897384a5b spec: completionPolicy: activeDeadlineSeconds: 1800 ttlSecondsAfterFinished: 30 type: Always failurePolicy: restartLimit: 3 type: FailFast parallelism: 20 template: metadata: creationTimestamp: "2024-08-15T07:03:30Z" spec: containers: - image: nginx:latest imagePullPolicy: Always name: preheat-image resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst nodeSelector: node-role.kubernetes.io/worker: "true" restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: active: 1 desired: 1 failed: 0 phase: running startTime: "2024-08-15T07:03:30Z" succeeded: 0 image

What you expected to happen: BroadcastJob status is success, this code why not calculate running pod as success? https://github.com/openkruise/kruise/blob/5421ee7c8e9e54ec38c1c1e20f4deb4a722f47a7/pkg/controller/broadcastjob/broadcastjob_controller.go#L243 How to reproduce it (as minimally and precisely as possible): once the pod running, the BroadcastJob type of job is success Anything else we need to know?: logs: I0815 07:03:30.362776 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds I0815 07:03:30.363118 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 1/1 nodes remaining to schedule pods I0815 07:03:30.363143 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=0, failed=0 I0815 07:03:30.392223 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn I0815 07:03:30.392911 7 broadcastjob_controller.go:727] Controller 1723705410-preheat-image created pod 1723705410-preheat-image-8xqpn I0815 07:03:30.392969 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.392982 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 362757824, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"} I0815 07:03:30.393421 7 recorder.go:103] events "msg"="Normal" "message"="Created pod: 1723705410-preheat-image-8xqpn" "object"={"kind":"BroadcastJob","namespace":"prdsafe","name":"1723705410-preheat-image","uid":"e6aacebb-2630-4dc8-8292-0a3897384a5b","apiVersion":"apps.kruise.io/v1alpha1","resourceVersion":"180784893"} "reason"="SuccessfulCreate" I0815 07:03:30.405421 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds I0815 07:03:30.405819 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods I0815 07:03:30.405845 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.405880 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.405893 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 405404075, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"} I0815 07:03:30.411097 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods I0815 07:03:30.411118 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.411150 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.411188 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"} I0815 07:03:30.420871 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 28.65228ms I0815 07:03:30.420947 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn I0815 07:03:30.420991 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 44.625µs I0815 07:03:30.421275 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods I0815 07:03:30.421300 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.421332 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:30.421344 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"} I0815 07:03:33.902923 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods I0815 07:03:33.902950 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:33.902980 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:33.902995 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"} I0815 07:03:34.935621 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods I0815 07:03:34.935649 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:34.935678 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0 I0815 07:03:34.935693 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}

Environment:

  • Kruise version:
  • kruise/kruise-manager:v1.5.2
  • Kubernetes version (use kubectl version):
  • 1.19
  • Install details (e.g. helm install args):
  • Others:

ying2025 avatar Aug 15 '24 07:08 ying2025