slsa-github-generator
slsa-github-generator copied to clipboard
[feature] generate a provenance file per artifact file
SLSA 1.0 recommends generating one provenance file per artifact file, rather than a single file with multiple artifact signatures. https://slsa.dev/spec/v1.0/distributing-provenance
The provenance SHOULD have a filename that is directly related to the build artifact filename. For example, for an artifact
<filename>.<extension>, the attestation is<filename>.attestationor some similar extension (for example in-toto recommends<filename>.intoto.jsonl.)
A provenance-per-artifact parameter could be added to enable this. When it is set to true, it would generate individual provenance files for each artifact. It may make sense to default it to true at some point.
https://github.com/slsa-framework/slsa-github-generator/issues/1565 is an earlier discussion on adding new artifacts to an existing multiple.intoto.jsonl file. It is presumablye superseded by the new recommendation, although it could still be possible to support it as well.
Thanks for creating this issue!
Just to be clear, I assume you are referring to the generic generator generator_generic_slsa3.yml?
If multiple artifacts are produced by a single build, do you think this option should make separate copies of the provenance where just the subject is different?
I'm not sure that that's true that there is a specific recommendation that a provenance file and artifact file have a 1:1 relationship though maybe @MarkLodato, @kpk47, or @di could chime in as they were the ones that worked on that document. My understanding is that this is a recommendation regarding the filename just so that it can be logically linked to an artifact or set of artifacts based on its filename, not necessarily a recommendation about the mapping of artifact to provenance file.
In the meantime you can simply call the workflow more than once with a single subject if you want a single file per artifact.
Yes, I had the generic generator in mind, it's what we use for Python projects right now. I'm not an expert in all these processes, so I can't say for sure about these questions. Right now, the workflow requires generating a list of sha256 hashes, one per artifact file. Instead of generating one multiple.intoto.jsonl file, I'd expect the provenance-per-artifact setting would generate one provenance file per line in the list of hashes.
IIRC, we were thinking about multi-platform builds with that recommendation. For example, if the build generates artifact_x86.gz, artifact_amd64.gz then it should also produce artifact_x86.attestation, artifact_amd64.attestation.
Re-reading the spec now, it looks like we were a bit inconsistent. Distributing Provenance (the page you linked) says each artifact should have an associated provenance file, but the Terminology page (https://slsa.dev/spec/v1.0/terminology#build-model) implies that provenance can describe multiple outputs.
What is your use case? In my mind, we attach provenance to a package artifact (https://slsa.dev/spec/v1.0/terminology#package-model -- sorry for the jargon), and a build process may produce multiple package artifacts. I'd be curious to learn about your workflow if it's producing multiple artifacts that aren't independent packages (e.g. a binary with main() and a library linked into the main() binary).
From my understanding, @di indicated that when PyPI eventually supports uploading provenance, it would make more sense to have one per file. Python does have platform-specific wheel files as well, and it's possible to build more of them for new Python versions.
I'm not super familiar with PyPI, but is this example demonstrative? https://packaging.python.org/en/latest/tutorials/packaging-projects/#generating-distribution-archives
For a build that generates
dist/
├── example_package_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
└── example_package_YOUR_USERNAME_HERE-0.0.1.tar.gz
I would expect the provenance files to look something like this:
dist/
├── example_package_YOUR_USERNAME_HERE-0.0.1-py3-none-any.attestation
└── example_package_YOUR_USERNAME_HERE-0.0.1.attestation
We had a related discussion on this requirement:
The build attestation SHOULD have a filename that is directly related to the build artifact filename.
The guideline doesn't say that a provenance file should have a single subject (an artifact file). It says from the provenance filename we should be able to identify the artifact. So the provenance file can have multiple subjects and copied per artifact file.
But right now, you can't do that. The filename multiple.intoto.jsonl doesn't say what artifacts it's related to, and doesn't encompass "all artifacts" if another artifact is built later, which is common in Python. And in the future where provenance is uploaded to PyPI, it would have to support detecting which artifacts are in which provenance files, rather than a simple match by filename.
FWIW, the provenance-name inputs allows you to set the name of the attestations file to whatever you want so you could set it to <package-name>.intoto.jsonl yourself.
That's not what I'm looking for. It takes about 1 minute for the generator workflow to run once. MarkupSafe builds 48 wheels in one step, so even if I wanted to run the workflow multiple times, I'd still need to figure out some way to produce 48 workflow runs based on the output of the build step, and be willing to wait much longer for the release to finish. That's not ideal, it seems like something that's much easier to solve with an option in the generator.
Here's the publish run for MarkupSafe: https://github.com/pallets/markupsafe/actions/runs/3941915297 and the list of files in the GitHub release: https://github.com/pallets/markupsafe/releases/tag/2.1.2
I'm not trying to debate the exact language of the spec. You all have pointed out that it doesn't require one provenance file per artifact, and I agree that's what "should" means. Then again, it doesn't say that one provenance file is not allowed or unreasonable.
Just to be clear, I think this is a fine and valid request. My intention wasn't fishing for ways to invalidate it and I think it would be good to implement. I agree it would take a lot longer for the workflows to run if you had to make a separate reusable workflow call for each file.
I was mostly commenting w/ some workarounds that anyone reading the issue might be able to use to work around this before this is implemented.
👋 hey @davidism, nice to see you here :)
To be clear, https://slsa.dev/spec/v1.0/distributing-provenance is about re-distributing provenance from the perspective of a software repository like PyPI, not an end user like yourself. The idea is that an installer/verifier can easily take an artifact like Flask-2.3.2.tar.gz and find the corresponding provenance for that artifact at a filename like Flask-2.3.2.tar.gz.slsa.
That said, I don't think it means that the creator of the artifact(s) necessarily needs to generate one provenance file per artifact -- as long as the re-distributor can know how to break apart a .jsonl file or some other aggregating file to create those "one artifact per file" mapping, I think that's fine.
That said, the provenance generator should probably be able to do the same thing, so maybe that's just a flag that modifies the output?