Enhance user-facing Condition/ContainerTerminated Reasons with backwards compatibility
We have had use cases as specified in #7223 and #7396 that downstream users and Tekton providers would want to have more granular error reasons. As discussed at the API WG and #7390, the existing CRDs {Step, TaskRun, PipelineRun} rely on knative.dev/pkg/apis#Condition for conveying user-facing reasons.
However these Condition reasons are being relied on for our Pipeline For example, dashboard will rely on the error reason to determine the step status.
Blocked issues: Surfacing of actual Termination Reason in Step Status Surface TaskRun failure reason when Step or Pod fails Make user-facing condition messages more granular as canonical error codes
Current workaround: Option 1: Add a new provisional field in the CRDs for {Step, TaskRun, PipelineRun} Condition/ContainerTerminatedState status Option 2: Add it to the condition/containerTerminated message
Future work plan: With a major version bump, i.e. in v2alpha1, we would be able to change the Condition/ContainerTerminated Reason
co-authored with: @renzodavid9
/assign @renzodavid9
@JeromeJu: GitHub didn't allow me to assign the following users: renzodavid9.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @renzodavid9
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign me
@renzodavid9: GitHub didn't allow me to assign the following users: me.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign me
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign
Through offline discussion with @renzodavid9 , we've examined that it might not be a feasible way to create a new field in the pod container status to update the container termination reason since these are all knative rather than Tekton conditions.
| Pros | Cons | Example | |
|---|---|---|---|
| Add a new field | More straightforward when for the transition to replace the condition Reasons during the next major version bump | #7545 | |
| Add to the condition status message | Easier to maintain | Users might not be aware of the change |
One other question is that are making changes to PipelineRunStatus Condition Reason and PipelineRunStatus Condition Reason also considered as breaking changes?
It looks like that Dashboard relies on the ConditionStatus on this to determine the completion of a PipelineRun (similarly for TaskRuns). Does this imply that the changes made to PipelineRunStatus/TaskRunStatus Reasons will not be breaking changes for Dashboard and others that consumes such status?
cc @AlanGreene @vdemeester @tektoncd/core-collaborators 🙏
The Dashboard also uses the reason values in a number of places, so changes to these would be considered a breaking change. See response to similar question on another issue proposing changes to the reason values for steps: https://github.com/tektoncd/pipeline/pull/7390#issuecomment-1850954066
Changes for PipelineRun and TaskRun would have similar issues.
Here are just a few examples of the Dashboard's current use of the reason field for PipelineRun and TaskRun:
- https://github.com/tektoncd/dashboard/blob/bd39f464faa0a44a0379afbfdf1a54371d132782/src/containers/TaskRun/TaskRun.jsx#L317
- https://github.com/tektoncd/dashboard/blob/bd39f464faa0a44a0379afbfdf1a54371d132782/packages/components/src/components/RunHeader/RunHeader.scss#L92
- https://github.com/tektoncd/dashboard/blob/bd39f464faa0a44a0379afbfdf1a54371d132782/packages/components/src/components/TaskTree/TaskTree.jsx#L66
- https://github.com/tektoncd/dashboard/blob/bd39f464faa0a44a0379afbfdf1a54371d132782/packages/components/src/components/StatusIcon/StatusIcon.jsx#L65
The Dashboard also uses the reason values in a number of places, so changes to these would be considered a breaking change. Changes for PipelineRun and TaskRun would have similar issues.
Thanks @AlanGreene for confirming the understanding on this 🙏 Would there also be a preference on the 2 options, adding a new field in PipelineRun/TaskRun status for enhancing via a new Reason field or adding the granular reasons into the Condition Status message?
It depends on whether this information is only intended for user consumption or if it's also intended to be used by clients / automation.
If it's intended for clients I don't think the message is a good option. Today I believe most clients treat it as an opaque string and avoid parsing it. If we change this we would need to define a clear API contract for its structure, content, etc. and guard against unexpected changes.
Following on https://github.com/tektoncd/pipeline/issues/7539#issuecomment-1877702017, and according with offline conversation with @JeromeJu , to avoid extending the ContainerStateTerminated from the k8s lib, we can instead modify the StepStatus from Tekton to add the new field. It will look something like this:
apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
name: task1
...
status:
steps:
- container: step-first
imageID: docker-pullable://busybox@sha256:ba76950ac9eaa407512c9d859cea48114eeff8a6f12ebaa5d32ce79d4a017dd8
name: first
terminationReason: AnyReason # <- This will be the new field
terminated:
containerID: docker://aaa111222
exitCode: 1
finishedAt: "2024-01-08T16:27:38Z"
reason: Error
startedAt: "2024-01-08T16:27:38Z"
Following on #7539 (comment), and according with offline conversation with @JeromeJu , to avoid extending the ContainerStateTerminated from the k8s lib, we can instead modify the StepStatus from Tekton to add the new field. It will look something like this:
apiVersion: tekton.dev/v1 kind: TaskRun metadata: name: task1 ... status: steps: - container: step-first imageID: docker-pullable://busybox@sha256:ba76950ac9eaa407512c9d859cea48114eeff8a6f12ebaa5d32ce79d4a017dd8 name: first terminationReason: AnyReason # <- This will be the new field terminated: containerID: docker://aaa111222 exitCode: 1 finishedAt: "2024-01-08T16:27:38Z" reason: Error startedAt: "2024-01-08T16:27:38Z"
Thansk @renzodavid9 for the discussion and this! This looks like the way to go as consensus reached during the API WG 🙏
The next AI for us might be to create a PR RFC(requesting for feedback). i.e.
- https://github.com/tektoncd/pipeline/pull/7545
And we could together have some documentations in the deprecations md and elsewhere to claim our migration plan when moving to v2alpha1.
Hello there!
The open PR for this currently has ~19 tests failing (unit + integration). This is mainly due to the new field, status. steps[].terminationReason, is expected to have an specific value for some tests where we check a TaskRun status against a desired one. In offline discussion with @JeromeJu, it sounds the proper way is to update the tests to add the new field with the expected value + any extra work needed, taking into account that we might push more changes in the same PR.
Other alternative discussed:
- For the current tests failing, during comparison, ignore the
status. steps[].terminationReasonfield, but this might not be ideal, looks better to include the new field in the current tests
Making the observation in case we have comments from members in the community. Thanks a lot! 😄