What happened:
BroadcastJob is running, and the pod already running, after activeDeadlineSeconds arrived the BroadcastJob type of job status is fail. But the pod has already been successfully run.
apiVersion: apps.kruise.io/v1alpha1 kind: BroadcastJob metadata: creationTimestamp: "2024-08-15T07:03:30Z" generation: 1 name: 1723705410-preheat-image resourceVersion: "180784897" uid: e6aacebb-2630-4dc8-8292-0a3897384a5b spec: completionPolicy: activeDeadlineSeconds: 1800 ttlSecondsAfterFinished: 30 type: Always failurePolicy: restartLimit: 3 type: FailFast parallelism: 20 template: metadata: creationTimestamp: "2024-08-15T07:03:30Z" spec: containers: - image: nginx:latest imagePullPolicy: Always name: preheat-image resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst nodeSelector: node-role.kubernetes.io/worker: "true" restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: active: 1 desired: 1 failed: 0 phase: running startTime: "2024-08-15T07:03:30Z" succeeded: 0

What you expected to happen:
BroadcastJob status is success, this code why not calculate running pod as success? https://github.com/openkruise/kruise/blob/5421ee7c8e9e54ec38c1c1e20f4deb4a722f47a7/pkg/controller/broadcastjob/broadcastjob_controller.go#L243
How to reproduce it (as minimally and precisely as possible):
once the pod running, the BroadcastJob type of job is success
Anything else we need to know?:
logs: I0815 07:03:30.362776 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds
I0815 07:03:30.363118 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 1/1 nodes remaining to schedule pods
I0815 07:03:30.363143 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=0, failed=0
I0815 07:03:30.392223 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn
I0815 07:03:30.392911 7 broadcastjob_controller.go:727] Controller 1723705410-preheat-image created pod 1723705410-preheat-image-8xqpn
I0815 07:03:30.392969 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.392982 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 362757824, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.393421 7 recorder.go:103] events "msg"="Normal" "message"="Created pod: 1723705410-preheat-image-8xqpn" "object"={"kind":"BroadcastJob","namespace":"prdsafe","name":"1723705410-preheat-image","uid":"e6aacebb-2630-4dc8-8292-0a3897384a5b","apiVersion":"apps.kruise.io/v1alpha1","resourceVersion":"180784893"} "reason"="SuccessfulCreate"
I0815 07:03:30.405421 7 broadcastjob_controller.go:200] Job 1723705410-preheat-image has ActiveDeadlineSeconds, will resync after 1800 seconds
I0815 07:03:30.405819 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.405845 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.405880 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.405893 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 405404075, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.411097 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.411118 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.411150 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.411188 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:30.420871 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 28.65228ms
I0815 07:03:30.420947 7 pod_readiness_controller.go:101] Starting to process Pod prdsafe/1723705410-preheat-image-8xqpn
I0815 07:03:30.420991 7 pod_readiness_controller.go:106] Finish to process Pod prdsafe/1723705410-preheat-image-8xqpn, elapsedTime 44.625µs
I0815 07:03:30.421275 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:30.421300 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.421332 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:30.421344 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:33.902923 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:33.902950 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:33.902980 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:33.902995 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
I0815 07:03:34.935621 7 broadcastjob_controller.go:250] prdsafe/1723705410-preheat-image has 0/1 nodes remaining to schedule pods
I0815 07:03:34.935649 7 broadcastjob_controller.go:251] Before broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:34.935678 7 broadcastjob_controller.go:327] After broadcastjob reconcile prdsafe/1723705410-preheat-image, desired=1, active=1, failed=0
I0815 07:03:34.935693 7 broadcastjob_controller.go:340] Updating job 1723705410-preheat-image status v1alpha1.BroadcastJobStatus{Conditions:[]v1alpha1.JobCondition(nil), StartTime:time.Date(2024, time.August, 15, 7, 3, 30, 0, time.Local), CompletionTime:, Active:1, Succeeded:0, Failed:0, Desired:1, Phase:"running"}
Environment:
- Kruise version:
- kruise/kruise-manager:v1.5.2
- Kubernetes version (use
kubectl version
):
- 1.19
- Install details (e.g. helm install args):
- Others: