pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Tasks that claim to emit results but don't should fail

Open bobcatfish opened this issue 5 years ago • 19 comments

Expected Behavior

If I create a Task which claims to emit a result, but it doesn't, the TaskRun should fail, i.e. I should be able to rely on the interface a Task claims to provide.

Actual Behavior

This will only be considered a failure if something in the Pipeline tries to use the result; if nothing tries to use the result, there will be no error, but if something does try to use the result you will get an error like:

tkn pr list
NAME                     STARTED          DURATION    STATUS
sum-three-pipeline-run   10 minutes ago   5 seconds   Failed(InvalidTaskResultReference)

Steps to Reproduce the Problem

I took this example pipeline with results and made a few changes:

  • add-task no longer emits a result, tho it claims to
  • the pipeline no longer refers to results from add-task
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: add-task
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
  results:
    - name: sum
      description: the sum of the first and second operand
  steps:
    - name: add
      image: alpine
      env:
        - name: OP1
          value: $(params.first)
        - name: OP2
          value: $(params.second)
      command: ["/bin/sh", "-c"]
      args:
        - echo -n $((${OP1}+${OP2}))
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: sum-three-pipeline
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
    - name: third
      description: the third operand
  tasks:
    - name: first-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: $(params.first)
        - name: second
          value: $(params.second)
    - name: second-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: "5"
        - name: second
          value: $(params.third)
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: sum-three-pipeline-run
spec:
  pipelineRef:
    name: sum-three-pipeline
  params:
    - name: first
      value: "2"
    - name: second
      value: "10"
    - name: third
      value: "10"

If you apply this, you will see that it runs successfully. If you add any references to the results, the run will fail.

Additional Info

  • Kubernetes version: v1.16.13-gke.401

  • Tekton Pipeline version: HEAD @ 7b5b2fa3ddd99d52baaa10face967b603c6940ee

bobcatfish avatar Nov 04 '20 16:11 bobcatfish

~~FYI in https://github.com/tektoncd/pipeline/pull/3472 we're removing the behaviour where a PipelineRun is put into a failed state due to a Task dropping a result. Feedback welcome!~~

Edit for posterity: I was wrong - a PipelineRun was never put into a failed state due to a Task dropping a result. Only if the PipelineRun couldn't fetch the assocaited PipelineSpec while trying to resolve those result references. Ignore this message!

ghost avatar Nov 04 '20 16:11 ghost

thanks @bobcatfish, https://github.com/tektoncd/pipeline/issues/2701 has some discussion as well

pritidesai avatar Nov 04 '20 16:11 pritidesai

I feel like this depends on what we define with "producing a result". I think with the current code today is:

  • no file created -> result not produced
  • file exists, empty -> empty result is produced (which will be an empty string)

That said, it is probably good to continue to force tasks to produce a result like they do today (unless we introduce support for default values, and a task specifies a default), as this would help task authors and users to avoid silent failures from being ignored.

afrittoli avatar Feb 01 '21 14:02 afrittoli

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar May 02 '21 14:05 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jun 01 '21 15:06 tekton-robot

@bobcatfish do you still want to pursue this one or should we leave it for the time being?

ghost avatar Jun 15 '21 16:06 ghost

Note: with the new feature gate / flagging work this would be classified as a backwards incompatible change and would therefore need its own feature flag. It could be made the default for a new apiVersion, such as v1.

ghost avatar Jun 15 '21 16:06 ghost

Feels like we are still not 100% sure what the desired behavior is here so I'd like to keep discussing

/remove-lifecycle rotten

bobcatfish avatar Aug 10 '21 16:08 bobcatfish

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Nov 08 '21 17:11 tekton-robot

I'd still like to bring this forward at some point and decide how to address this as a community - maybe as part of v1 discussions? but it hasnt been a priority yet :(

/remove-lifecycle stale

bobcatfish avatar Nov 09 '21 21:11 bobcatfish

In TEP-0075, we are planning to support object results. These results embrace the optional by default behavior of JSON Schema. A task can produce an object result with a missing attribute and it won't fail. However, a subsequent task will fail if it attempts to use the missing attribute from the object result, causing the pipeline to fail. That makes the object results attributes optional if not used, and required if used. That behavior is consistent with the existing behavior reported in this issue as a bug, so wondering if we should document this behavior and close this issue?

jerop avatar Feb 04 '22 02:02 jerop

I'm still leaning toward failing tasks that claim to emit results but fail and I'll try to explain why.

The Task spec defines the interface of the task, including:

  • params it needs to be operate, which can be optional if defaults are provided
  • results it declares it will produce, which consumers of the Task should be able to expect and rely on

In TEP-0075 we add support for results to declare they are of type object and we start to support defining the schema of that object, using JSON schema.

The crux of the problem is, what does it mean if a Task declares it will produce a result with a JSON schema indicating it will have key "foo" but it does not have that key at runtime? JSON schema allows for keys to be declared as "required" and assumes all keys are optional by default. So if a Task declares that it will produce a result with the key "foo" and it does not, it is still technically meeting the contract it declared in that the schema it defined did say that key was optional 😭

However I think it is different when we are talking about the results themselves; I don't think the same logic we apply to each individual result that a Task declares it produces needs to the same as the application of the schema we apply to validate those results. imo it all comes down to: did the Task fulfill the contract it declared. If it said it would make a result and did not make that result, I believe it did not fulfill its contract.

So in the example of a Task declaring a result with a schema indicating it is an object with (implied optional) key foo, I would think:

  • Producing no result at all (i.e. writing nothing to /tekton/results/my-result) would be an error
  • Producing an empty object (i.e. writing {} to /tekton/results/my-result) would fulfill its contract (it emitted the result it said it would and it matched the schema)
  • Producing an object with key "foo" (i.e. writing {"foo": "bar"} to /tekton/results/my-result) would fulfill its contract
  • Producing an object with key "foo" and additional keys (e.g. writing {"foo": "bar", "baz":"qux"} to /tekton/results/my-result) would also fulfill the contract (unless the schema included additionalProperties:false)

All of that being said I can see why this would be confusing 😅 maybe we can find a compromise in TEP-0075 that manages to balance both worlds, I'm going to make a suggestion over there.

bobcatfish avatar Feb 07 '22 22:02 bobcatfish

@bobcatfish thank you for the detailed discussion above - makes sense to distinguish between producing results and keys in object results - especially given that users will have control to say that they keys are required if they wish

agreed, we can go ahead with making the results themselves required - we'll need to explain this well in the docs though (cc @vinamra28)

jerop avatar Mar 15 '22 12:03 jerop

/assign @vinamra28

pritidesai avatar May 03 '22 16:05 pritidesai

@pritidesai: GitHub didn't allow me to assign the following users: vinamra28.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @vinamra28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar May 03 '22 16:05 tekton-robot

/assign

vinamra28 avatar May 12 '22 15:05 vinamra28

@vinamra28 is implementing TEP-0048 which addresses this issue

pritidesai avatar Aug 23 '22 16:08 pritidesai

related: https://github.com/tektoncd/pipeline/pull/6817

dibyom avatar Jul 17 '23 16:07 dibyom

As discussed during the API WG on 17.07, moving this to nice-to-have for v1, in favour of https://github.com/tektoncd/pipeline/issues/6932

afrittoli avatar Jul 17 '23 16:07 afrittoli