slsa-github-generator
slsa-github-generator copied to clipboard
Review fields in `invocation.environment`
The SLSA v0.2 provenance format describes invocation.environment as follows:
Any other builder-controlled inputs necessary for correctly evaluating the build. Usually only needed for reproducing the build but not evaluated as part of policy.
This SHOULD be minimized to only include things that are part of the public API, that cannot be recomputed from other values in the provenance, and that actually affect the evaluation of the build. For example, this might include variables that are referenced in the workflow definition, but it SHOULD NOT include a dump of all environment variables or include things like the hostname (assuming hostname is not part of the public API).
Let's review which GitHub environment variables we are including to make sure we are complying with the spirit of the spec here. Most notably, I don't think we should be dumping the full payload in environment. Do we have a strong argument for doing so, or were we just covering all bases by doing so?
I'm also curious about the "not evaluated as part of policy" phrasing in the provenance format spec too. We are capturing some fields which could provide valuable input to a policy evaluation (i.e. github_actor and github_repository_owner), where is the more appropriate place to include such fields? The only other mention of policy in the provenance format is in invocation.configSource.entryPoint – is configSource a better place to capture the GitHub variables? cc @MarkLodato @TomHennen
I'm also curious about the "not evaluated as part of policy" phrasing in the provenance format spec too. We are capturing some fields which could provide valuable input to a policy evaluation (i.e.
github_actorandgithub_repository_owner), where is the more appropriate place to include such fields? The only other mention of policy in the provenance format is ininvocation.configSource.entryPoint– isconfigSourcea better place to capture the GitHub variables? cc @MarkLodato @TomHennen
We need to clarify this in v1.0 of the spec. Mind filing an issue there?
Our original intention of "policy" was to ensure that the package was built from the intended source and build process. For example, the Python package "pyyaml" is supposed to be built from https://github.com/yaml/pyyamls by the (hypothetical) GitHub actions workflow ".github/workflows/build.yaml". To verify this, we need to do two things (besides checking hashes, signatures, etc.):
- Does the
invocation.configSourcematch the expected value? - Are we confident that there are no
invocation.parametersthat can be used to inject malicious behavior? (This is the "Parameterless" requirement, which needs further refinement.)
By definition, the values in invocation.environment MUST NOT allow a user to inject behavior. Therefore, there should be no need to check for them. Things like github_actor or github_repository_owner are irrelevant for the policy above. I could see how someone might want to check them, but I'm wary about adding lots of stuff to the provenance unless we have a clear use case for how it will be used.
I'm also curious about the "not evaluated as part of policy" phrasing in the provenance format spec too. We are capturing some fields which could provide valuable input to a policy evaluation (i.e.
github_actorandgithub_repository_owner), where is the more appropriate place to include such fields? The only other mention of policy in the provenance format is ininvocation.configSource.entryPoint– isconfigSourcea better place to capture the GitHub variables? cc @MarkLodato @TomHennenWe need to clarify this in v1.0 of the spec. Mind filing an issue there?
Done: slsa-framework/slsa#394
Our original intention of "policy" was to ensure that the package was built from the intended source and build process. For example, the Python package "pyyaml" is supposed to be built from https://github.com/yaml/pyyamls by the (hypothetical) GitHub actions workflow ".github/workflows/build.yaml". To verify this, we need to do two things (besides checking hashes, signatures, etc.):
- Does the
invocation.configSourcematch the expected value?- Are we confident that there are no
invocation.parametersthat can be used to inject malicious behavior? (This is the "Parameterless" requirement, which needs further refinement.)
That description matches my understanding. The provenance format is referring specifically to policy which evaluates whether SLSA integrity properties have been met, not generic/custom policy to evaluate arbitrary fields in the provenance. The latter kind of policy may be extremely useful but is more likely to be application specific, whereas the core SLSA integrity properties can be evaluated in a more automated/generic way.
By definition, the values in
invocation.environmentMUST NOT allow a user to inject behavior. Therefore, there should be no need to check for them. Things likegithub_actororgithub_repository_ownerare irrelevant for the policy above. I could see how someone might want to check them, but I'm wary about adding lots of stuff to the provenance unless we have a clear use case for how it will be used.
Agreed.
For this issue, I think it's worth dropping the full payload (easier to add it later if it's needed). However, the spec does state that invocation.environment is "an arbitrary JSON object with a schema defined by buildType." so we have flexibility here.
Regarding the full payload. The reasoning for adding it is that it's easier for the verifier to take a call whether they want to verify the data, without the need for the builder to understand all corner cases of GitHub triggers. If we were to cherry pick some payload fields, then small implementation bugs in the builder would make older builds not "verifiable" by a consumer. Making the builder as "simple" as possible and pushing the logic to the consumer is what drove us to include the full payload.
One source of bugs for the builder is that there are many GitHub triggers, all with their own special cases and corner cases (e.g., workflow_dispatch does not have a payload, certain ref are empty for certain triggers, etc). This becomes hard to maintain, and prone to mistakes - We keep hitting corner cases in scorecard, for example. Since the GitHub payload format is stable, consumers can use it the way they want to, without the need for the builder to be "smart". I think this makes it simpler for consumers (or the verifier) to support new features without a giant switch statement to distinguish between builder versions.
Today, we do use the payload during verification for pushed tags. The ref are empty in this case, so the verifier uses the information from the payload to infer the branch the tag corresponds to.
I know it's very verbose, but it does add simplicity to builders.
I'm always worried we'll receive feature requests that will eventually complicate the code, etc.
Happy to re-visit this decision if we can keep the builders simple, though.
fyi, here's a bug I just discovered https://github.com/slsa-framework/slsa-verifier/issues/88 that illustrates the special cases of triggers.
In addition to what @laurentsimon wrote, I think the event payload is important for reproducing the build. Much of Github Actions workflows depend on the payload and the info contained in it, so without it you're likely not to have enough info to reproduce the build. Also, the event payload is part of the Github public API and have a documented format. So my personal thought is that including it doesn't necessarily go against the spirit of the invocation environment.
That said, we have noticed that some things like repository id might be needed to definitively identify a Github repository. i.e. you could have situations where a user deletes a repo or their account and that account and/or repo are recreated by someone else. I'm not sure they'll be able to recreate a github digest in order to completely circumvent configSource checks but it's at least something to think about. If we really want to avoid environment being used in policy, we might want to think about putting repository id in the configSource.
Thanks for taking the time to write up the rationale. Mulling this over, but it feels like we should leave as-is.