Provenance: how to record version information about the `builder`
In general, how should a builder record information about its own version in the provenance?
From @laurentsimon on https://github.com/in-toto/in-toto-golang/issues/159 (with edits from me):
The provenance
builderfield only contains a singleid, but it would be beneficial to add aversionanddigestas well.An alternative would be to use ID=
theID:version@hash. I don't think this is a good approach because it treats the builder differently from the rest of the provenance information (invocation.configSourcecontains a digest, for example)One may ask why we need a version if we have a hash. Versions are useful during verification of the provenance: based on a the version, the verifier can adjust its logic/verification. Hashes don't allow to do this easily.
Maybe I'm misunderstanding the purpose of
provenance.builder, and the intention may be to put the data I'm after insideinvocation.configSource? (which does not contain a version field either).Note: my use case is a build using a GitHub action on GitHub. The builder is the action (which has a version and a hash).
cc @asraa
The builder.id field represents the transitive closure of the trusted computing base of the builder, meaning the set of things we have to trust that had influence over the provenance generation. That cannot really base hashed or versioned because it is not one piece of software but a collection of systems.
For example, if the builder is a particular GitHub Action, then the builder.id represents not just that action's code but also the GitHub Actions runner base image, the GitHub/Azure infrastructure (hypervisor, control plane, key management, etc.), the provenance signing infrastructure (e.g. Sigstore), and so on. The hash of the action's code does not cover all of those things.
That said, the hash/version of what is available is extremely useful for incident response, and we should have a recommendation on where to add it.
- One way is to list it in
materials, though ideally we'd have a way to differentiate different types of materials (I think we have an open issue for that.) You can do that now. - Another way is to add a new field in
builderthat allows capturing these different software versions, akin toinvocation.configSource. That is also a reasonable route, though it would require a spec change.
As for the builder ID to use, I would just use whatever GitHub gave you, so either <action>@<ref> or <action>@<hash>, depending on what the user wrote in their workflow. That is the trust anchor, so it makes sense to me.
When we talk about the builder version are we talking about:
- The version of the build service?
- The version of the executable that the build service runs?
If 1 note that complex build service may not have a single version number to reference, they might be composed of multiple different binaries run on multiple different machines. I suppose the same potentially applies to 2 but might be able to scope it down to "the version of the binary that signed the provenance" (even provenance generation could be spread among multiple binaries).
We use the builder.id field to determine if an artifact was built by the expected build service. When handling thousands of policies We'd rather not have to worry about updating all the policies to point at different versions just because the builder rev'd their version number.
I do see value in having some way to indicate if the build service has made a significant change to its behavior that changes how downstream users should trust it. Could such a version number just be built into the builder.id string?
[edit] Seems like Mark beat me to it.
As for the builder ID to use, I would just use whatever GitHub gave you, so either
<action>@<ref>or<action>@<hash>, depending on what the user wrote in their workflow. That is the trust anchor, so it makes sense to me.
I'd be wary of <action>@<hash> as the builder id. Someone might decide to clarify comments and cause existing policies to be invalidated. <action>@<tag> or <action>@<branch> might be preferred for builder.id?
As for the builder ID to use, I would just use whatever GitHub gave you, so either
<action>@<ref>or<action>@<hash>, depending on what the user wrote in their workflow. That is the trust anchor, so it makes sense to me.I'd be wary of
<action>@<hash>as the builder id. Someone might decide to clarify comments and cause existing policies to be invalidated.<action>@<tag>or<action>@<branch>might be preferred for builder.id?
Let me clarify some background, to make sure everyone is on the same page. In Laurent's use case, the "builder" is a combination of the GitHub Actions service + a trusted reusable workflow. GitHub creates an OIDC token identifying the reusable workflow (job_workflow_ref). This is whatever branch, tag, or hash the project's YAML config uses to reference the reusable workflow . If they listed branch main, it's @refs/heads/main; if tag v1.2, it's @refs/tags/v1.2; if hash abcd1234…, it's @abcd1234… (I think). My recommendation is to use this job_workflow_ref as the builder.id field because that is what GitHub actually guaranteed. The verifier would then check that the builder.id in the payload (1) matches exactly the value in the signing certificate and (2) is acceptable to the policy.
I believe what you are saying is that having a hash in the builder.id makes (2) more difficult. I agree, but I'm not sure there's a way around it. People use hashes to pin their dependencies, so we likely can't mandate using a tag instead. Combined with the fact that anyone can cause a commit to show up in a GitHub repo (more info), we need some way to verify that the reusable workflow was a "good" one, or else the rest of the provenance cannot be trusted.
I think the bug you referenced is just restricted to the GitHub UI. It can't actually be used to trick someone cloning the repo to get arbitrary code in there (can it?).
If we go this route I guess I'd suggest that either we recommend people use the branch and least make it very clear that any downstream users' policies will break if they chose a different hash/branch/tag. If we don't I think this behavior will be quite surprising.
I think it would be fair, when using <action>@<branch>, for the implied trust model to be "I trust the way the referenced repository is maintained and don't need to validate each specific version" in the same way that you trust regular source that is pulled from a branch of some repo.
I also think it would be good to list the hash somewhere so that people can debug issues and identify problematic attestations.
I think the bug you referenced is just restricted to the GitHub UI. It can't actually be used to trick someone cloning the repo to get arbitrary code in there (can it?).
I don't know. We'd need to test to find out. At a minimum, it can be used to point to arbitrary branches, which means we'd need to be very careful about what is pushed.
Another option might be to have a second attestation on that commit showing that it is an ancestor of the proper branch.
related https://github.com/gossts/slsa-provenance/issues/21
Want to chime in to +1 this!
I've been talking to a few folks informally (cc @lumjjb @chuangw6 @mattmoor @mlieberman85) about wanting a place for additional details in the SLSA spec about service infra, since if we're running on vulnerable infra it's tough to trust attestations built on top of it.
As another use case, this is something I was thinking would be useful for Tekton / Tekton Chains so that we could record:
- the Tekton Pipelines version used to run the build
- the Tekton Chains version used to create the attestation
- the Kubernetes version Tekton is running on
There is currently a version field but:
- We may want to offer a
bomLinkor similar to reference a CylconeDX or SPDX bill of materials for the service. - We may want to allow arbitrary attributes.
Is it not sufficient to use version + extension fields? #764 raises the prominence of extension fields to hopefully make it more clear that this is an option.
If so, then I suggest that we close this issue as resolved.