PipelineRun is canceled prior to its child TaskRuns being canceled.
Expected Behavior
If a PipelineRun is cancelled during execution, the PipelineRun should be marked as terminal status by pipelinerun reconciler after all its child TaskRuns have been cancelled successfully by taskrun reconciler. To users, this means that
- PipelineRun completion time should be greater than its child TaskRuns completion time.
- The final PipelineRun update should be emitted after all its child TaskRuns' final updates are emitted.
Actual Behavior
- Final PipelineRun update is emitted before child taskruns are emitted.
- PipelineRun completion time is smaller than child taskrun completion time.
Steps to Reproduce the Problem
- patch this commit that adds logs into controller
- Run this pipelinerun
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: tekton-pr
spec:
pipelineSpec:
tasks:
- name: t1
taskSpec:
steps:
- image: busybox
script: |
sleep 600
- Cancel the PipelineRun by applying this modified yaml.
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: tekton-pr
spec:
status: "Cancelled"
pipelineSpec:
tasks:
- name: t1
taskSpec:
steps:
- image: busybox
script: |
sleep 600
- Check the pipeline controller log, you will see the log that shows that the terminal pipelinerun is emitted first and has smaller completion time (searching
>>> emitting).
{"severity":"warn","timestamp":"2024-06-06T15:49:43.749Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:253","message":">>> emitting pipelinerun, name: tekton-pr, completion time: 2024-06-06 15:49:43.749423811 +0000 UTC m=+87.911781253, condition status: False, condition ltt: {2024-06-06 15:49:43.749422032 +0000 UTC m=+87.911779516}","commit":"9760223-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"5bb03973-86fb-42e4-b1f6-858ef01876fe","knative.dev/key":"default/tekton-pr"}
{"severity":"warn","timestamp":"2024-06-06T15:49:43.749Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:777","message":"stopping task run \"tekton-pr-t1\" because of \"TaskRunCancelled\"","commit":"9760223-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"090e81e8-7394-4ddc-8d15-4551931c8abd","knative.dev/key":"default/tekton-pr-t1"}
{"severity":"info","timestamp":"2024-06-06T15:49:43.749Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:364","message":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"default\", Name:\"tekton-pr\", UID:\"32b2ce33-3cb2-4502-9165-fcf4457e7375\", APIVersion:\"tekton.dev/v1\", ResourceVersion:\"169834134\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' PipelineRun \"tekton-pr\" was cancelled","commit":"9760223-dirty"}
{"severity":"warn","timestamp":"2024-06-06T15:49:43.768Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:166","message":">>> emitting taskrun, name: tekton-pr-t1, completion time: 2024-06-06 15:49:43.749754967 +0000 UTC m=+87.912112418, condition status: False, condition ltt: {2024-06-06 15:49:43.74975262 +0000 UTC m=+87.912110080}","commit":"9760223-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"090e81e8-7394-4ddc-8d15-4551931c8abd","knative.dev/key":"default/tekton-pr-t1"}
Additional Info
-
Kubernetes version:
Output of
kubectl version:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:41:01Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.9-gke.1000000", GitCommit:"21b78d6e66a8659125f752293620ab9b2ed9bd7b", GitTreeState:"clean", BuildDate:"2024-04-17T09:14:21Z", GoVersion:"go1.21.9 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.28) exceeds the supported minor version skew of +/-1
-
Tekton Pipeline version:
Output of
tkn versionorkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
latest main
It looks like Tekton intends to mark PR as canceled after all its child TRs are marked as canceled successfully (code source).
However, taskrun reconciler needs a bit time to reconcile to mark child TR as canceled and to emit the final update after pipelinerun reconciler patches the tr. Since pipelinerun reconciler cancels the PipelineRun and emits it immediately after patching the taskruns, the pr is emitted prior to tr being emitted and has earlier completion time.
Looked into the log a bit more. It looks like the parent pipelinerun was canceled twice and thus 2 terminal pipelinerun events were emitted (first line and last line in the following log). Both 2 events reported pipelinerun with terminal state condition status==false that means it's done, but the completion time and LastTransitionTime are changed in the second event, which is confusing.
{"severity":"warn","timestamp":"2024-06-07T00:02:46.271Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:253","message":">>>> emitting pipelinerun, name: tekton-pr-emit-ha, completion time: 2024-06-07 00:02:46.271534506 +0000 UTC m=+113.976747118, condition status: False, condition ltt: {2024-06-07 00:02:46.271533558 +0000 UTC m=+113.976746181}","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"3501b088-0278-43ac-b67c-e25e44298096","knative.dev/key":"default/tekton-pr-emit-ha"}
{"severity":"info","timestamp":"2024-06-07T00:02:46.271Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:364","message":"Event(v1.ObjectReference{Kind:\"PipelineRun\", Namespace:\"default\", Name:\"tekton-pr-emit-ha\", UID:\"e96bb31f-a06b-46f9-b3a1-a74a50360d04\", APIVersion:\"tekton.dev/v1\", ResourceVersion:\"170179062\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' PipelineRun \"tekton-pr-emit-ha\" was cancelled","commit":"edf7e90-dirty"}
{"severity":"warn","timestamp":"2024-06-07T00:02:46.286Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:166","message":">>>> emitting taskrun, name: tekton-pr-emit-ha-matrix-task-ha, completion time: 2024-06-07 00:02:46.271505564 +0000 UTC m=+113.976718177, condition status: False, condition ltt: {2024-06-07 00:02:46.271493449 +0000 UTC m=+113.976706071}","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"e0a1fea5-640b-40ec-a647-984ee0248077","knative.dev/key":"default/tekton-pr-emit-ha-matrix-task-ha"}
{"severity":"info","timestamp":"2024-06-07T00:02:46.288Z","logger":"tekton-pipelines-controller.event-broadcaster","caller":"record/event.go:364","message":"Event(v1.ObjectReference{Kind:\"TaskRun\", Namespace:\"default\", Name:\"tekton-pr-emit-ha-matrix-task-ha\", UID:\"941e7a91-c8b3-400c-b29c-56f79237057f\", APIVersion:\"tekton.dev/v1\", ResourceVersion:\"170179063\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' TaskRun \"tekton-pr-emit-ha-matrix-task-ha\" was cancelled. TaskRun cancelled as the PipelineRun it belongs to has been cancelled.","commit":"edf7e90-dirty"}
{"severity":"info","timestamp":"2024-06-07T00:02:46.296Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"3501b088-0278-43ac-b67c-e25e44298096","knative.dev/key":"default/tekton-pr-emit-ha","duration":0.047430652}
{"severity":"info","timestamp":"2024-06-07T00:02:46.296Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/cancel.go:139","message":"cancelling TaskRun tekton-pr-emit-ha-matrix-task-ha","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"47592fa6-3a06-4274-b05e-58d33303fc21","knative.dev/key":"default/tekton-pr-emit-ha"}
{"severity":"info","timestamp":"2024-06-07T00:02:46.311Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"e0a1fea5-640b-40ec-a647-984ee0248077","knative.dev/key":"default/tekton-pr-emit-ha-matrix-task-ha","duration":0.040763412}
{"severity":"info","timestamp":"2024-06-07T00:02:46.312Z","logger":"tekton-pipelines-controller","caller":"taskrun/taskrun.go:149","message":"taskrun done : tekton-pr-emit-ha-matrix-task-ha \n","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"a65f95f4-9b47-40f6-9b0f-93f325712bca","knative.dev/key":"default/tekton-pr-emit-ha-matrix-task-ha"}
{"severity":"info","timestamp":"2024-06-07T00:02:46.312Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"a65f95f4-9b47-40f6-9b0f-93f325712bca","knative.dev/key":"default/tekton-pr-emit-ha-matrix-task-ha","duration":0.000230135}
{"severity":"warn","timestamp":"2024-06-07T00:02:46.331Z","logger":"tekton-pipelines-controller","caller":"pipelinerun/pipelinerun.go:253","message":">>>> emitting pipelinerun, name: tekton-pr-emit-ha, completion time: 2024-06-07 00:02:46.331209788 +0000 UTC m=+114.036422401, condition status: False, condition ltt: {2024-06-07 00:02:46.331207784 +0000 UTC m=+114.036420413}","commit":"edf7e90-dirty","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.pipelinerun.Reconciler","knative.dev/kind":"tekton.dev.PipelineRun","knative.dev/traceid":"47592fa6-3a06-4274-b05e-58d33303fc21","knative.dev/key":"default/tekton-pr-emit-ha"}
Right now when we cancel the pipelineRun, we update the Condition of the PipelineRun Status and mark it as cancelled. At the same time, we submit patch for the Child Tasks. That's why it appears that the PipelineRun was cancelled before the child Tasks.
I think the correct approach should be for the pipelineRun controller to send the patches to the child Tasks to cancel and before updating it's own condition to cancelled, it should wait until all the Child Tasks have been cancelled and then cancel itself.
WDYT @tektoncd/core-maintainers ?
/assign @renzodavid9
@chitrangpatel: GitHub didn't allow me to assign the following users: renzodavid9.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @renzodavid9
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign me
@renzodavid9: GitHub didn't allow me to assign the following users: me.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign me
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign @renzodavid9
Hello there,
To elaborate on @chitrangpatel comment, this change will make PipelineRun cancellation asynchronous. With this, we would need the PipelineRun to enter a "Cancelling" state when cancellation is triggered and remain there until all child TaskRuns are cancelled, at which point it will mark itself as "Cancelled". We currently don't have such a "Cancelling" state, so we'll need to add it and make the transition as explained before. It would look something like:
- When cancellation is triggered:
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: pipeline-execution
...
status:
conditions:
- lastTransitionTime: ...
message: PipelineRun "pipeline-execution" is cancelling
reason: Cancelling
status: Unknown
type: Succeeded
- After all the PipelineRun child tasks stop, it will go to "Cancelled":
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: pipeline-execution
...
status:
conditions:
- lastTransitionTime: "2024-07-22T20:13:42Z"
message: PipelineRun "pipeline-execution" was cancelled
reason: Cancelled
status: "False"
type: Succeeded
I would like to validate this idea here to see if anyone have any thoughts about it @tektoncd/core-maintainers
Thanks!
I think I like the idea, but I wonder how impactful it might be for existing clients (and the change / addition of a new "reason"). I would think it might not be super breaking as the "end state" doesn't change, there is just going to be an "unknown" state, but we need to asses this. Otherwise, it makes sense to me.
Is it possible to add a condition in the annotation instead of adding the condition state Cancelling, similar to tekton.dev/canceling-state, so that it will not affect existing clients?
@chengjoey I'ld rather go with the Condition state. This is not "backward" incompatible per se, it's just that client might not know this state, but all other state they know are still there, so it shouldn't break them.