slsa-github-generator icon indicating copy to clipboard operation
slsa-github-generator copied to clipboard

Suppress provenance non-fully-rendered fields in github_event_payload

Open chipzoller opened this issue 3 years ago • 3 comments
trafficstars

I'm noticing in both the generic provenance generator and the Go generator that certain fields in the environment object are included even if they have portions of the templated values which are not rendered. For example, I see a number of these fields which still have {} template references in the path predicate.invocation.environment.github_event_payload.repository.

{
  "allow_forking": true,
  "archive_url": "https://api.github.com/repos/chipzoller/zulu/{archive_format}{/ref}",
  "archived": false,
  "assignees_url": "https://api.github.com/repos/chipzoller/zulu/assignees{/user}",
  "blobs_url": "https://api.github.com/repos/chipzoller/zulu/git/blobs{/sha}",
  "branches_url": "https://api.github.com/repos/chipzoller/zulu/branches{/branch}",
<snip>

Because the full value cannot be substituted due to lack of inputs of those portions of the template, and therefore these portions of the provenance are basically useless, would it perhaps be better to suppress these fields from the generated provenance if the complete value is not rendered? Alternatively if they must remain, if the full value cannot be completely rendered maybe defaulting the value to something consistent like either null or none? Using one of these two may help to provide better programmatic parsing by downstream tools to identify the bits of information that are actually useful in validating (or invalidating) the provenance of an artifact.

chipzoller avatar Jun 20 '22 02:06 chipzoller

The data included there is the full value of the event object that triggered the workflow. You can see the various payloads that should be there here: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads

We don't render templated values in those fields, it's simply the event data given to the workflow run. I would also have assumed that the values there would be the urls for the specific repository and not template strings, but it seems that's the format of the repository object in the GitHub API. You can see an example response here: https://api.github.com/repos/octocat/Hello-World

I think most of them denote optional parts of the URL. For example, https://api.github.com/repos/octocat/Hello-World/issues/comments{/number} refers to both https://api.github.com/repos/octocat/Hello-World/issues/comments and URLs like https://api.github.com/repos/octocat/Hello-World/issues/comments/1146825

ianlewis avatar Jun 20 '22 05:06 ianlewis

So to be clear, I think we're doing the right thing in including them since those are the values given in the event payload, and would potentially be needed to reproduce the build but I'm open to suggestions about what might be right to include. See #196 for more discussion.

ianlewis avatar Jun 20 '22 05:06 ianlewis

Ok, thanks for the additional info. I don't think it's a big deal either way, just that, as a user/consumer of this information, it's useless to me because it isn't "complete" and so therefore seems to add noise to the signal.

chipzoller avatar Jun 20 '22 13:06 chipzoller