slsa-github-generator
slsa-github-generator copied to clipboard
Set arch key in Environment
Get set the architecture for the build.
Since the job that built the artifact could be using a different CPU architecture than the job running the provenance generation, this probably needs to be added as an optional input.
somewhat related to https://github.com/slsa-framework/slsa-github-generator-go/pull/16 about build environment /cc @joshuagl
somewhat related to slsa-framework/slsa-github-generator-go#16 about build environment /cc @joshuagl
If we support runners with different architectures in the future this will unfortunately not work in the case of the provenance-only workflow since provenance generation will happen in a separate job VM and the build step happens in a VM outside of the reusable workflow's control.
AFAICT, each job can be run on a separate runner which could be a different architecture. Github Actions kind of makes it hard to determine this definitively since the architecture is determined implicitly via labels.
We technically could assume it's amd64 for the time being since we are only supporting hosted runners for now though.
somewhat related to slsa-framework/slsa-github-generator-go#16 about build environment /cc @joshuagl
If we support runners with different architectures in the future this will unfortunately not work in the case of the provenance-only workflow since provenance generation will happen in a separate job VM and the build step happens in a VM outside of the reusable workflow's control.
agreed. I think more generally, we need to decide how we convey the fact that some statements in the provenance are forgeable (taken from user) and some are not (repo commit hash). We could:
- Only populate statements that are non-forgeable... this falls apart since the artifact hash itself is forgeable :/
- Add a "statement of non-forgeability" to accompany on-forgeable statements; but I doubt this is do-able in the current specs
@MarkLodato what did you have in mind?
AFAICT, each job can be run on a separate runner which could be a different architecture. Github Actions kind of makes it hard to determine this definitively since the architecture is determined implicitly via labels.
Do you know if labels can be set on GitHub-hosted runners too? If think it was the case but I don't remember for sure. If this was the case, we could look this list https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#choosing-github-hosted-runners as the ground truth for GitHub-hosted runners, and consider everything else as self-hosted.
Btw, since the API will be used by ecosystem-specific reusable workflows, is the plan for the "arch" be an input to the API and set by trusted reusable workflows?
somewhat related to slsa-framework/slsa-github-generator-go#16 about build environment /cc @joshuagl
If we support runners with different architectures in the future this will unfortunately not work in the case of the provenance-only workflow since provenance generation will happen in a separate job VM and the build step happens in a VM outside of the reusable workflow's control.
agreed. I think more generally, we need to decide how we convey the fact that some statements in the provenance are forgeable (taken from user) and some are not (repo commit hash). We could:
- Only populate statements that are non-forgeable... this falls apart since the artifact hash itself is forgeable :/
- Add a "statement of non-forgeability" to accompany on-forgeable statements; but I doubt this is do-able in the current specs
@MarkLodato what did you have in mind?
AFAICT, each job can be run on a separate runner which could be a different architecture. Github Actions kind of makes it hard to determine this definitively since the architecture is determined implicitly via labels.
Do you know if labels can be set on GitHub-hosted runners too? If think it was the case but I don't remember for sure. If this was the case, we could look this list https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#choosing-github-hosted-runners as the ground truth for GitHub-hosted runners, and consider everything else as self-hosted.
I don't think that you can set labels on Github-hosted runners. It's only amd64 for hosted runners as you can't set the arch, just the OS. So we could just hard-code it to amd64 since we are only support Github-hosted runners, but I don't know if there's anything to theoretically stop someone from using a self-hosted runner to build and Github-hosted runner for provenance. Or even for both building and generating provenance. I'm not sure if there is a way to tell since I don't know too much about self-hosted runners.
Btw, since the API will be used by ecosystem-specific reusable workflows, is the plan for the "arch" be an input to the API and set by trusted reusable workflows?
I'm not sure yet. You can if you use golang package and add it to the invocation.environment. Based on #3:
r := slsa.NewWorkflowRun(subjects, ghContext)
r.Invocation.Environment["arch"] = /* ... */
p, _ := slsa.HostedActionsProvenance(r)
For provenance-only, it could be an input to the workflow but we could also just not provide it. I'm not sure I want us to allow a bunch of inputs for things we can't determine on our own but I don't know. I could probably be convinced either way.
- Only populate statements that are non-forgeable... this falls apart since the artifact hash itself is forgeable :/
- Add a "statement of non-forgeability" to accompany on-forgeable statements; but I doubt this is do-able in the current specs
The artifact hash can be used as input from the build (and I assume the subject name as well? though it's not mentioned). This was the reasoning for us taking the subjects and hashes as input in the first place. https://slsa.dev/spec/v0.1/requirements#service-generated
The following provenance fields MAY be generated by the user-controlled build steps:
The output artifact hash from Identifies Artifact. Reasoning: This only allows a “bad” build to falsely claim that it produced a “good” artifact. This is not a security problem because the consumer MUST accept only “good” builds and reject “bad” builds.
We should do whatever we can to avoid putting anything else into the provenance that comes from the build step. So I think that means that the arch is out as an input.
I don't see a need to put the arch key in environment. There are three purposes of provenance, in priority order:
- Enable policy decisions given only the provenance. I don't anticipate anyone using
archfor this (orenvironmentat all, except in the case of an overlooked parameter.) - Enable reproducibility given the provenance and all original inputs.
archcan be determined from the commit, so it doesn't need to go in the provenance. - Enable ad-hoc investigations. I'm not sure why
archwould be needed for this.
we need to decide how we convey the fact that some statements in the provenance are forgeable (taken from user) and some are not (repo commit hash)
Yes, I'd like to solve this in the spec. I don't have any good ideas though.
Yeah, there is a smell of us wanting to add arch without actually understanding the use case. It's in the provenance examples for Github actions and was in the original slsa-go repo so just got carried over.
I also think it's fine for us to not have it. Especially for the provenance-only workflow since it's not really discoverable anyway.
2. Enable reproducibility given the provenance and all original inputs.
archcan be determined from the commit, so it doesn't need to go in the provenance.
Not sure I totally follow how the commit would have it, unless you mean it's implied from the source and build steps or the runner that was triggered by the commit?
Not sure I totally follow how the commit would have it, unless you mean it's implied from the source and build steps or the runner that was triggered by the commit?
That you can figure out the architecture from the workflow file. For example, GitHub hosted runners currently all run on amd64, and if there is another architecture in the future, it will be specified in the workflow file.
That you can figure out the architecture from the workflow file. For example, GitHub hosted runners currently all run on amd64, and if there is another architecture in the future, it will be specified in the workflow file.
Yeah, still it might be convenient to have in the provenance. The thing I worry about maybe more than arch is whether self-hosted runners were used or not. It seems to me that it might have security implications that folks might want to put into policy (e.g. I don't trust your self-hosted runners). Also, since they are selected via labels, I think it's relatively easy to accidentally create a workflow that selects self-hosted runners you didn't intend to select.
https://docs.github.com/en/actions/using-workflows/reusing-workflows#using-runners
As with arch, it also seems to be really hard to tell what kind of runner the workflow ran on after the fact. Aside from the opaque node_id it's hard to tell where a workflow was actually run as runs-on labels aren't present on the workflow or workflow run. It seems like you'll have to check out the repo at the right ref and parse it out of the workflow yaml.
https://docs.github.com/en/rest/actions/workflows#get-a-workflow
https://docs.github.com/en/rest/actions/workflow-runs#get-a-workflow-run
That said, this is probably more of an "enterprise" use case rather than an OSS use case since I imagine most projects would be using hosted runners exclusively. So it may be low priority for us at the moment.
Revisting this: Can we add runner context https://docs.github.com/en/actions/learn-github-actions/contexts#runner-context directly into the provenance?
We would need to record the runner context of the build step in the reusable workflow, right? That would require https://docs.github.com/en/rest/actions/workflow-runs#get-a-workflow-run as Ian said: we can't simply grab the context inside the build step because that would be untrusted.
we can't simply grab the context inside the build step because that would be untrusted.
It would work for builders (Go), but not generic generators?