chains icon indicating copy to clipboard operation
chains copied to clipboard

Chains missing `configSource` in `predicate.invocation`

Open chuangw6 opened this issue 2 years ago • 13 comments

Expected Behavior

The intoto attestation generated by Chains should include configSource section in predicate.invocation that is supposed to have 3 fields uri, digest and entryPoint per SLSA spec.

Actual Behavior

Chains currently doesn't set this field at all for taskrun level provenance.

chuangw6 avatar Aug 16 '22 20:08 chuangw6

RE: Comment "we currently don't set ConfigSource because we don't know which material the Task definition came from"

We might need to preserve an entry in the materials that is specifically for invocation.configSource.uri and invocation.configSource.digest.

If the entry in materials created with the values of CHAINS-GIT_COMMIT and CHAINS-GIT_URL params is what we want for configSource, we also need a preserved param name i.e. CHAINS-ENTRYPOINT for invocation.configSource.entryPoint.

Thoughts?

chuangw6 avatar Aug 16 '22 20:08 chuangw6

cc @wlynch @priyawadhwa @lcarva @bcaton85 @jagathprakash

chuangw6 avatar Aug 16 '22 20:08 chuangw6

IIUC, this would be source information for the Run object? We don't really have a concept of that today, since all Run objects come to us directly through the k8s API.

CHAINS_GIT* is really source provenance information, which describes what is happening inside the build not the how the build was created.

Top level Run refs (i.e. pipelineRef, taskRef) are probably going to be the closest approximation, but really we might need something like Workflows in place first.

wlynch avatar Aug 16 '22 21:08 wlynch

IIUC, this would be source information for the Run object? We don't really have a concept of that today, since all Run objects come to us directly through the k8s API.

From slsa spec,

invocation.configSource object, optional Describes where the config file that kicked off the build came from. This is effectively a pointer to the source where buildConfig came from.

For the url/digest, I am thinking the repo url/digest that triggers the pipelinerun i.e. the gitrevision and gitrepositoryurl in this example.

If so, we can teach trigger to record the config source information in the annotation of the run object so that Chains can pick up that information rather than using params. wdyt?

chuangw6 avatar Aug 16 '22 21:08 chuangw6

Triggers are almost too generic here since they don't have a strong affinity for particular source or could have multiple sources, etc. I think workflows are meant to make this a bit more opinionated.

I'm hesitant to give special meaning to a user supplied value - any ideas on how can we verify this is actually correct? @Yongxuanzhang this sounds similar to trusted resources, but instead of author identity, we want to verify the repo source👀 (idk if this is a good idea but maybe this is something that could be a part of the signature?)

Basing it off of top level refs might work, since you're only ever going to reference one source. Main downside is it wouldn't capture the Run config itself, but rather its referenced object. Though maybe this is okay? If you think of the Run object as more as a request rather than a config, it might be fine. (I have to think through this some more)

@dibyom (not sure if you're the best person for this - feel free to redirect) has there been any discussion around remote resolution of Run objects that might be relevant here?

wlynch avatar Aug 16 '22 23:08 wlynch

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

If a Tekton Bundle is used, this would be really useful:

  • uri -> OCI image reference
  • digest -> digest of the image? of the Task/Pipeilne definition?
  • entrypoint -> name of the task within the bundle

I guess it would be a bit of a stretch to make those attributes meaningful for in-cluster Task/Pipeline definitions but it seems plausible.

lcarva avatar Aug 17 '22 13:08 lcarva

I'm hesitant to give special meaning to a user supplied value - any ideas on how can we verify this is actually correct? @Yongxuanzhang this sounds similar to trusted resources, but instead of author identity, we want to verify the repo source👀 (idk if this is a good idea but maybe this is something that could be a part of the signature?)

Could you elaborate more on "that could be a part of the signature", you mean also signing it or it can be another signature that we want to verify?

Yongxuanzhang avatar Aug 17 '22 14:08 Yongxuanzhang

@lcarva

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

👍 This is inline with what I was thinking with top level refs.

I guess it would be a bit of a stretch to make those attributes meaningful for in-cluster Task/Pipeline definitions but it seems plausible.

I actually don't think it's that far off! entrypoint is optional so we could omit it for API objects that only have 1 item, uri could be the API path (i.e. /apis/tekton.dev/v1beta1/namespaces/default/tasks/foo, maybe throw in an @UUID to uniquely ID it), digest could be a SHA256 digest of the json.


@Yongxuanzhang

Could you elaborate more on "that could be a part of the signature", you mean also signing it or it can be another signature that we want to verify?

Either or. Just wanted to call it out since it feels like a similar problem.

wlynch avatar Aug 18 '22 14:08 wlynch

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

The explanation for the invocation field in slsa spec is "Identifies the event that kicked off the build.". In that sense, I feel like refs (i.e. oci bundles url, digest, task/pipeline name) are something part of the build instead of an event that kicked off the build itself. Also, how are we going to store refs in a pipelinerun attestation's invocation field in case there are multiple oci bundle refs used in the pipeline?

chuangw6 avatar Aug 18 '22 15:08 chuangw6

The explanation for the invocation field in slsa spec is "Identifies the event that kicked off the build.". In that sense, I feel like refs (i.e. oci bundles url, digest, task/pipeline name) are something part of the build instead of an event that kicked off the build itself.

Bit of a philosophical question which is why I'm on the fence here - is there meaningful difference between kubectl apply -f taskrun.yaml and tkn task start task.yaml if they both generate the equivalent API request? The latter feels like it fits the definition of the configSource field (especially if you compare this to similar flows like GCB yamls or GitHub workflow_runs).

Also, how are we going to store refs in a pipelinerun attestation's invocation field in case there are multiple oci bundle refs used in the pipeline?

There's only 1 top level ref for a pipeline (the pipeline itself might have several taskrefs, but those would apply to the tasks, not the pipeline).

wlynch avatar Aug 18 '22 15:08 wlynch

Observation / question: for human invocations of kubectl or tkn task start, is the intention that invocation will be blank (which I suppose means "unknown"?) or will we explicitly populate "manual" or some such?

bendory avatar Aug 23 '22 15:08 bendory

@dibyom (not sure if you're the best person for this - feel free to redirect) has there been any discussion around remote resolution of Run objects that might be relevant here?

I'm not sure what this means :) We can resolve pipelines and tasks from a variety of places but what does it mean to resolve a run remotely?

I do have a question though - The SLSA spec says the invocation field "Identifies the event that kicked off the build." while the invocation.configSource field "Describes where the config file that kicked off the build came from".

For Tekton, this config file (i.e. the pipeline definition) could come from git, oci registry, the cluster or could even be embedded in the run. To me, it seems like the pipelineRef or the pipelineSpec fields are better indicators of where the config file came from rather than the incoming event?

Re: for passing provenance info from the eventing/triggering side, I generally agree with @wlynch 's assesment that Workflows will make it better (probably with the eventSource field. That being said, I think we can still make progress/prototype with current Triggers. Some ideas based on chatting with @chuangw6 and @bendory

  1. TriggerBindings are user specified and are probably not a good place to capture the triggering provenance. But we could have the interceptors provide this info. For instance, the GitHub interceptor could attach the needed provenance fields as part of extensions with a well-known name.

  2. We could configure the EventListeners to look for these well known extension fields and add them to the pipeline runs being created either as annotations or well known extra parameter passed on to the pipelineRun (or as part of additional input contexts in the future). Chains can observe these known annotations/params for provenance generation.

  3. This assumes we are trusting the Triggers EventListener and the interceptor. We could also decide to have the EventListener sign the information that is passed down and use a similar mechanism as Trusted Tasks to verify.

dibyom avatar Aug 23 '22 15:08 dibyom

I do have a question though - The SLSA spec says the invocation field "Identifies the event that kicked off the build." while the invocation.configSource field "Describes where the config file that kicked off the build came from". For Tekton, this config file (i.e. the pipeline definition) could come from git, oci registry, the cluster or could even be embedded in the run. To me, it seems like the pipelineRef or the pipelineSpec fields are better indicators of where the config file came from rather than the incoming event?

Agreed! This echoes what @wlynch and @lcarva said about top-level refs (i.e. source of pipeline/task definition). We can focus on this instead of trigger information.

git source

gitresolver expects from the author the exact three fields that we need for the configSource url, revision and pathInRepo. However, there are some challenges:

  • revision field can be commit SHA, branch or tag, whereas SLSA's invocation.configSource.digest requires digest only. So we need to map the branch/tag to actual commit SHA.
  • We need to pass down this information for Chains to pick up without modifying user-defined spec.
  • Also, we need to make sure we can standardize the 3 fields that have corresponding values in other sources i.e. oci bundle mentioned by @lcarva.

I created this PoC to try to solve the 3 challenges. https://github.com/tektoncd/pipeline/pull/5397 Please take a look.

oci bundle

Agreed with @lcarva 's suggestion for oci bundle. The bundle param expected by bundle resolver can be url@digest or url@tag. So we also need to fetch the actual digest if tag is provided.

in-cluster definition

Agreed with @wlynch said

uri could be the API path (i.e. /apis/tekton.dev/v1beta1/namespaces/default/tasks/foo, maybe throw in an @UUID to uniquely ID it), digest could be a SHA256 digest of the json.

But if the actual yaml is not in source control, do we want to consider this case and record the information in SLSA provenance?

inline/embedded definition

If pipelinerun.yaml is in source control, do we want to record its source information?

catalog

TODO: will follow up on this later.

chuangw6 avatar Aug 31 '22 00:08 chuangw6

/close

as completed by https://github.com/tektoncd/chains/pull/554

chuangw6 avatar Dec 02 '22 23:12 chuangw6

@chuangw6: Closing this issue.

In response to this:

/close

as completed by https://github.com/tektoncd/chains/pull/554

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Dec 02 '22 23:12 tekton-robot