chains
chains copied to clipboard
Parse Type-hinted StepResults for the Provenance
Feature request
Parse type-hinted StepResults and surface them appropriately in the provenance
Use case
StepActions were released in Tekton Pipelines v0.54.0. As a part of that, we introduced StepResults. StepResults are not automatically surfaced to TaskResults. Users explicitly need to fetch them like they do for PipelineResults from TaskResults. As a result, if a type-hinted StepResult is not explicitly surfaced by the user then it will go unnoticed by Chains and thus not be surfaced as a ResolvedDependency/Material, Subject or Byproduct.
The solution is simple. The formatter just needs to additionally look into the StepState for StepResults.
Proposal
Add an extra key isBuildArtifact with a boolean value defaulted to false to the Type-hinted Object StepResults. When Chains parses the step results, it can look for that field and insert the output artifact into subject or byProducts. In the new Artifacts Struct defined in https://github.com/tektoncd/community/blob/main/teps/0147-tekton-artifacts-phase1.md, we can add a similar field to outputs.
Pros
- Chains gets accurate intention about what to do with the output artifact.
- The
StepAction/Taskcan actually ask the intent to thePipeline/TaskRunvia params.- That way, since it is the
Pipelinethat really understands the complete picture, it can indicate the nature of the artifact that the underlyingStepAction/Taskwas used to fetch (i.e.subjector abyProduct).
- That way, since it is the
Example
apiVersion: tekton.dev/v1alpha1
kind: StepAction
metadata:
name: gcs-upload
spec:
image: gcs
params:
- name: isBuildArtifact
description: Should the file/folder you are uploading be considered as a build artifact (subject) or a byProduct?
default: "false"
results:
- name: upload_ARTIFACT_OUTPUT
type: object
script: |
echo {"uri:" ..., "digest": ..., "isBuildArtifact": "$(params.isBuildArtifact)"} | tee $(step.results.upload_ARTIFACT_OUTPUT.path)
---
apiVersion: tekton.dev/v1
kind: TaskRun
spec:
params:
- name: isBuildArtifact
value: "true"
taskSpec:
steps:
- name: step-producing-artifact
ref:
name: gcs-upload
params:
- name: isBuildArtifact
value: $(params.isBuildArtifact)
Behavior amongst Task Results, Pipeline Results and StepResults
Based on extensive discussions in the Tekton Chains WG, we've reached the following conclusion:.
To provide a consistent behavior across TaskResults, PipelineResults and StepResults we propose the following.
For non-object type-hinted results, we assume that they are all byProducts.
For object results, we assume that by default *ARTIFACT_OUTPUTS are byProducts. If a matching key isBuildArtifact is found and is set to true then Chains considers it as a subject.
Note that this currently not the behavior for Task/Pipeline Results (they are subjects by default). This behaviour will be made consistent with StepResults so that across the board, users can expect to do the same thing. i.e. declare something as a build Artifact to treat it as a subject.
To not break the users immediately, we, propose gating this behind slsa/v2alpha4 so that users currently using slsa/v2alpha2,3 do not experience this change. However, once we mature to slsa/v2beta1, we will deprecate older alpha versions and this (i.e. *ARTIFACT_OUTPUTS are byProducts by default and that users have to specifically declare it as a subject) will be the expected behavior.
/good-first-issue /help-wanted
@chitrangpatel: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.
In response to this:
/good-first-issue /help-wanted
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
cc @lcarva @wlynch @aaron-prindle
Looking for feedback.
/assign
No objections!
My guess is things that are produced by steps but not surfaced to the Task should be byproducts though? So a little bit different than the Task -> Pipeline extraction.
I think we can rely on type-hints to drive that decision. Currently, Type-hinted results get populated in ResolvedDependencies/Subjects. Other results are inserted as byProducts.
I can imagine a case where a git-clone StepAction produces type-hinted StepResults that is not necessarily surfaced to the Task because it may not be needed to be sent to downstream Tasks (or in case of a TaskRun, not needed to be surfaced at all). I think Chains should still insert it into ResolvedDependencies not byProducts. Same argument for Images produced etc.
Summarizing the discussion from Slack and adding some more thoughts:
Context:
When parsing Artifacts/Type-hinted StepResults from StepActions, should we pollute the subjects with thwm or byProducts?
- Subjects are the critical build Products.
- ByProducts are things like cache, other artifacts (e.g. coverage, logs etc.), other results that were produced as part of the build but are not the main build products.
Proposal:
StepAction’s type-hinted StepResultsshould be considered asbyProductsby default.- This is mainly driven by the fact that a
StepActionis the smallest unit and has no knowledge about the complete build pipeline.- e.g. a gcs-upload StepAction might push an artifact and type-hint the results.
- But whether the pushed artifact is a
Subject(critical build-artifact) or abyProductis better known by the user of theStepActioni.e. aTask. Although it is best known by the highest level of orchestration (i.e.Pipelinein case of aPipelineRun) - Therefore, any output artifact that is surfaced via
Task-level Type-hinted Resultshould be considered a subject but notStepResult.
- This is mainly driven by the fact that a
Pros:
- Chains will not pollute the
Subjectsif we treat things asbyProducts.- Whats the concern here? It seems like the bigger concern is users forgetting to take that extra step and having artifacts without provenance (?)
Cons:
- Prior to
StepActions, the same logic applies to remoteTasks, yet we consider any type-hinted results that indicate an output artifact as a subject. So technically, there it’s the pipeline author that truly knows which artifact is a Subject vs a byProduct. - What about users that don't use
StepActionsand still treatTasksas the smallest unit? Why are the Task Results in that case treated differently? Shouldn't by a similar logic, we only considertype-hintedoutput Artifact relatedPipelineResultsassubjects? - In contrast, if users forget to explicitly surface results up the ladder then they do not have a provenance.
- In the current workflow, the users of remote
Tasks,StepActionsdo not need to worry about having to do anything special to get the provenance. The providers of theseTasksandStepActionsalready perform type-hinting and follow best practices to get thins into Chains. - Today, if people are simply converting their Pipeline runs into Taskruns with StepActions, this will cause a breaking behavior.
- We would also be treating Tasks using StepActions different from Tasks with inlined Steps?
Other thoughts:
- If users are
type-hintingtheirTasks/StepActionsthen it should be considered important and so asubject.- Currently, if things are not type-hinted, we already list them under
byProductsso I think that we should consider anythingtype-hintedas important. - For certain
StepActions(e.g. upload-cache etc.) authors don't need to type-hint the results at that stage. They can leave it to the users to decide whether to type-hint it at theTasklevel. - Just because its produced from a StepAction should probably not determine that its not important.
- Worst case, even if we have a few additional subjects that aren't too important, there is no harm done. Not catching them seems worse. I wouldn't consider it as polluting the Subjects. I would probably treat it that way if we did that with all our results.
- Currently, if things are not type-hinted, we already list them under
- In
SLSA v0.1, we did not havebyProductsand so our behavior was to treat all type-hinted results as subjects. This would be the case today as well if we didn't haveStepActions. What ifSLSAgets rid of byProducts again in the future in a2.0? Would we go back to treating it as aSubjectthere?
The solution where a difference in the behavior of Chains based on the BuildType might be ok here but I wanted to try and convince everyone that treating type-hinted StepResults as subjects is not a bad thing.
Thoughts @wlynch (hopefully, I didn't miss anything major)?
Synced offline with @wlynch
Proposal
Add an extra key isBuildArtifact with a boolean value defaulted to false to the Type-hinted Object StepResults. When Chains parses the step results, it can look for that field and insert the output artifact into subject or byProducts. In the new Artifacts Struct defined in https://github.com/tektoncd/community/blob/main/teps/0147-tekton-artifacts-phase1.md, we can add a similar field to outputs.
Pros
- Chains gets accurate intention about what to do with the output artifact.
- The
StepAction/Taskcan actually ask the intent to thePipeline/TaskRunvia params.- That way, since it is the
Pipelinethat really understands the complete picture, it can indicate the nature of the artifact that the underlyingStepAction/Taskwas used to fetch (i.e.subjector abyProduct).
- That way, since it is the
Example
apiVersion: tekton.dev/v1alpha1
kind: StepAction
metadata:
name: gcs-upload
spec:
image: gcs
params:
- name: isBuildArtifact
description: Should the file/folder you are uploading be considered as a build artifact (subject) or a byProduct?
default: "false"
results:
- name: upload_ARTIFACT_OUTPUT
type: object
script: |
echo {"uri:" ..., "digest": ..., "isBuildArtifact": "$(params.isBuildArtifact)"} | tee $(step.results.upload_ARTIFACT_OUTPUT.path)
---
apiVersion: tekton.dev/v1
kind: TaskRun
spec:
params:
- name: isBuildArtifact
value: "true"
taskSpec:
steps:
- name: step-producing-artifact
ref:
name: gcs-upload
params:
- name: isBuildArtifact
value: $(params.isBuildArtifact)
/assign @renzor
@chitrangpatel: GitHub didn't allow me to assign the following users: renzor.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @renzor
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign me
@renzodavid9: GitHub didn't allow me to assign the following users: me.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign me
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign @renzodavid9