warehouse
warehouse copied to clipboard
[Roadmap] Implement PEP 740
PEP 740 is in a final but not yet approved state. This issue is intended to lay out the dependencies/subcomponents of its implementing, including things that can be done in a preliminary manner.
Index side
- [x] Support the
attestationsform field in the current legacy upload endpoint- [x] https://github.com/pypi/warehouse/pull/15952
- [x] Parse each received attestation from
attestationsand verify each against the uploading trusted publisher; fail the file upload if any attestations are invalid- This should be done in as generic a manner as possible, to ensure that future extension to email and key identities is possible.
- [x] https://github.com/pypi/warehouse/pull/15952
- [x] https://github.com/pypi/warehouse/pull/16304
- [x] https://github.com/pypi/warehouse/pull/16302
- [x] Persist verified attestations and expose them via the PEP 503 and Simple JSON API indices (via provenance objects, as specified in PEP 740)
- [x] https://github.com/pypi/warehouse/issues/14799
- [x] https://github.com/pypi/warehouse/issues/14727
- [x] https://github.com/pypi/warehouse/pull/16546
Uploader/publish side
- [x] Support attestation generation in
gh-action-pypi-publish- Concretely, this means
gh-action-pypi-publishshould usesigstore-pythonto sign the attestation payload defined in PEP 740 - The resulting bundle should be transformed into a PEP 740 attestation object
- [x] https://github.com/pypa/gh-action-pypi-publish/pull/236
- [x] https://github.com/pypa/gh-action-pypi-publish/pull/245
- Concretely, this means
- [x] Support attestation uploading in
twine: https://github.com/pypa/twine/issues/1094- [x] https://github.com/pypa/twine/pull/1095
- [x] https://github.com/pypa/twine/pull/1098
- [x] https://github.com/pypa/twine/pull/1099
- [x] https://github.com/pypa/twine/pull/1101
- [x] https://github.com/pypa/twine/pull/1107
- [x] Once
twinesupports uploading,gh-action-pypi-publishshould use that support to actually upload the attestation objects it generates above - [ ] Update all relevant docs (PyPI's docs, PyPUG)
- [x] ~~https://github.com/pypi/warehouse/pull/16063~~
- [ ] https://github.com/pypi/warehouse/pull/16398
CC @di @webknjaz @facutuesca for visibility
I've gone ahead and filed a tracking issue in pip-tools as well: https://github.com/jazzband/pip-tools/issues/2080.
- Concretely, this means
gh-action-pypi-publishshould usesigstore-pythonto sign the attestation payload defined in PEP 740
With the exciting changes suggested in https://github.com/pypa/gh-action-pypi-publish/pull/230, it looks like we'll be able to leverage bits of composite actions and call the action even (if it makes sense to do so, as an alternative to using a Python lib within the container like the rest of stuff).
- Concretely, this means
gh-action-pypi-publishshould usesigstore-pythonto sign the attestation payload defined in PEP 740
Should this be done by the uploader, though? My initial feeling was that this should be done right after the job building the artifacts. Though, perhaps not by pypa/build (because of the privilege elevation concerns I often complain about). Are we going to treat the artifacts that the uploader gets as coming from trusted sources? Should we be emphasizing the security considerations and teaching the users more about the least privilege principle?
Should this be done by the uploader, though?
Hmm -- I suppose it doesn't have to be, although for targeting publish provenance (and getting provenance generation exposed to the largest number of users) I think it's the most straightforward path.
Long term, PEP 740 is designed to enable multiple attestations: the building step(s) can generate one, as can the publishing step. But I believe getting that into a workable state will require a decent bit more coordination, so I want to target an MVP of "the artifact is signed by the same identity that pushed it to the index" 😅
@woodruffw fair! I just wanted to make sure we're on the same page.
@webknjaz Glad you're thinking about build provenance already, I'm interested in tackling build provenance for Python too :rocket:
These initial attestations are similar to NPM's publish provenance so is best done by the uploader. If the attestation can show "this artifact was uploaded by this workflow which matches a Trusted Publisher configuration for the project" we can already build some interesting things like verified GitHub URLs, exposing which artifacts have Trusted Publishers configured to users, monitoring for changes in provenance, etc
@sethmlarson fair enough! I remember similar ideas being brought up during the initial private beta of trusted publishing. It's nice to have some more or less specified goals recorded and synchronized, though, which is why I posted my thoughts :)
Looking back, I'm glad that I encouraged implementing it in my action from the very beginning, months before it's gone GA so everybody could start using it right away with almost no changes. So I'm hoping that the Sigstore integration happens in a similar fashion, enabling any early adopters to start uploading signatures with zero changes to their workflows (if they already use trusted publishing, they have the OIDC access set up).
Should this be done by the uploader, though? My initial feeling was that this should be done right after the job building the artifacts.
I think your initial feeling is valid: since it's not guaranteed that the workflow using the publish action is also the workflow that built the artifacts, this isn't necessarily "build provenance".
That said, having an attestation ('publish provenance') from the uploader in the absence of any other attestation being provided to it would still be very valuable here I think, especially for projects that aren't using Trusted Publishing yet.
@woodruffw What behavior do we want when uploading attestations to a file already in PyPI? Currently, the behavior for duplicate files is either stop the upload process and return OK (if the new file is the same as the existing one), or fail (if the files don't match). How do we want to deal with those cases when the request also includes attestations?
What behavior do we want when uploading attestations to a file already in PyPI? Currently, the behavior for duplicate files is either stop the upload process and return OK (if the new file is the same as the existing one), or fail (if the files don't match).
For the MVP, I think we want to skip if the distribution file already exists, for consistency with the existing upload API behavior.
Longer term, this probably falls under the "Upload 2.0 API" PEP -- we'll want a way to augment existing uploaded distributions with new attestations. But that's out of scope for now, IMO 🙂
FYI: there's some new developments within GitHub, it seems — https://github.com/pypa/gh-action-pypi-publish/pull/236#issuecomment-2140972463.
* [ ] Update PyPUG and adjacent documentation to remove references to `gh-action-sigstore-python`
I think this one is somewhat separate — it signs built artifacts (in that snippet) while the pypi-publish integration signs whatever's uploaded (no matter if it was built within the workflow or downloaded).
Something that @haydentherapper raised: Trusted Publishing supports private repos.
This isn't a "disclosure" issue at the moment since we don't expose the Trusted Publisher metadata anywhere, but with PEP 740 we will.
Some points:
- No matter what, we will continue supporting Trusted Publishing from private repos.
- Conceptually, there's nothing stopping someone from generating a private attestation, although its semantics are unclear (since a publish or build attestation from a private repo can't be really analyzed by third parties, besides a basic "GitHub promises this is right" check.)
- There's a (fair) PII concern about leaking a private repository name.
Some engineering details:
- Both GitLab and GitHub provide a visibility type claim, which either PyPI or the publishing workflow could filter on in terms of including attestations. It should probably be PyPI though, since PyPI is already handling the JWT claims.
Per NPM's docs, it looks like they require a public repo for provenance:
Ensure your package.json is configured with a public repository that matches (case-sensitive) where you are publishing with provenance from.
This doesn't quite mesh with PyPI's model since each distribution file has its own metadata, but in theory we could check them all for consistency. But that is potentially more brittle than just rejecting attestations if the accompanying OIDC JWT doesn't have the right visibility.
From discussion in the Sigstore Slack: one route forward here is to reject attestations from private repositories, unless the project explicitly is explicitly configured (either on PyPI or the project metadata) to allow them. This eliminates the risk of a user inadvertently leaking a private repository name (or similar) through their Trusted Publisher configuration and subsequent publish attestations.
If configured on PyPI, this could be something like a new checkbox under /manage/<project>/settings, e.g. "allow attestations from private Trusted Publishers."
If configured via project metadata, things get a little hairier (since the metadata is duplicated once per distribution, and there's no free-form metadata at the moment). So doing it via PyPI is probably easier.
@facutuesca and I had a conversation about the persistence side of this the other day; summarizing:
- For the time being, we should treat publish provenance as a baseline: in other words: PyPI can accept SLSA Provenance as well if provided, but every attestation set should come with a Publish Provenance attestation at the minimum (since they're all coming from Trusted Publishing)
- With Publish Provenance as a baseline, we can add a
File <-> OIDCPublisherrelationship that tracks the Trusted Publisher that was verified to have published the givenFile - That relationship then allows us to bootstrap verified URLs, etc.
Some related thoughts:
- Having a
File <-> OIDCPublisherrelationship means thatOIDCPublishers are never deleted once one or more files are published against them, even if noProjecthas thatOIDCPublisherregistered as an active publisher anymore. This may or may not be an issue; just highlighting as a "lifetime" change. - We need to determine the appropriate "policy" for rendering things like provenance markers on the project-level page (i.e.
/p/<project>): in addition to verified URLs, we might want a little checkmark similar to NPM's that says the project is provenanced. For that, it probably make senses to require all files in latest release to have publish attestations, even if the data layer underneath doesn't prohibit "heterogenous" uploads.
Having a File <-> OIDCPublisher relationship means that OIDCPublishers are never deleted once one or more files are published against them, even if no Project has that OIDCPublisher registered as an active publisher anymore
This might add some complexity, since right now there is no way of modelling an "inactive" publisher: if a publisher exists in the DB, then it's active. Do we want to add a new column to the table, an is_active field that becomes False if the user deletes the corresponding Trusted Publisher? (more specifically, it becomes False iff the user deletes it and there exists at least one file that was uploaded with a publish attestation signed by that publisher)
This might be problematic, since right now there is no way of modelling an "inactive" publisher: if a publisher exists in the DB, then it's active.
Hmm, I hadn't thought of this. A new is_active column makes sense to me, or alternatively we could define an is_active helper that checks whether len(projects) != 0. I think the latter would work approximately as well, without requiring a DB migration, although I might have missed something 🙂
With Publish Provenance as a baseline, we can add a File <-> OIDCPublisher relationship that tracks the Trusted Publisher that was verified to have published the given File
I was imagining a separate model for a "verified attestation" that's separate from the publisher, due to the publisher potentially changing over time.
I was imagining a separate model for a "verified attestation" that's separate from the publisher, due to the publisher potentially changing over time.
I think there are two separate pieces of state here:
- The
OIDCPublisherthat uploaded theFile, which we'd track with the relationship noted above - Zero or more attestations associated with the
File, which we'll need to track with a new model (AttestationFile?Attestation?) and will also need aFilerelationship
(2) may also need a relationship to the "thing" that can verify it, which for now will always be an OIDCPublisher (but in the future could be an email address, or public key, or something else).
There might be a better way to model all of that, of course. But we'll definitely need both the verified attestation and the thing that verified it, i.e. the OIDCPublisher 🙂
After further discussion with @woodruffw, we think that it might be better to verify a release's URLs once (at the time the release is created, during the first file's upload), and store the fact that the URL was verified in the release metadata. This implies:
- When creating a release, if the first uploaded file includes a publish attestation that passes verification, and the URL of the Trusted Publisher matches the release metadata, we mark those URLs as "verified" in the release metadata
- Further files uploaded to that same release won't affect the release metadata or the verified status.
- If the first file uploaded does not include a publish attestation (or the URL does not match the release metadata), then the release metadata URLs are marked as unverified. Future files uploaded to this same release will not affect the verified status.
In short, whether a release's metadata URLs are verified or not depends exclusively on the first file uploaded (and its corresponding attestation).
The decision comes from the fact that PEP-740 allows uploading releases where only some of the distributions have attestations, so whether a release has "verified" URLs can be ambiguous if some of the files have publish attestations and others don't. Instead, we choose to use the first file uploaded (the one that creates the release) as the source of truth for verifying the release's URLs.
Also, by verifying the URL once and storing the result, we skip having the File <-> OIDCPublisher relationship, and the complexities that come with having to keep "inactive" OIDCPublishers forever.
@facutuesca @woodruffw Is there a risk in checking and updating the verified status of URLs for each artifact? It was noted somewhere that many projects use multiple build infrastructures to upload artifacts, it would be unfortunate to lose the "verified" status of metadata based on a race of build infra.
Is there a risk in checking and updating the verified status of URLs for each artifact?
I don't think there's a risk per se, but it's unfortunately not currently straightforward to do with Warehouse's current File <-> Release relationship: the Release contains the "reference" metadata for rendering purposes (including URLs), and it's only created once (from the first File for the version being uploaded). So, in effect, there's really only set of URLs that PyPI "knows" for the entire Release, regardless of differences between individual Files.
TL;DR: to check each artifact, I think we'd need a stronger consistency guarantee about the metadata recorded in the Release model vs. each linked File, and (IIRC) that was excluded as a constraint from PEP 740 due to concerns that it would hamper adoption.
(The alternative here would be to consider the Release model's URLs mutable/subject to changes in verification state, e.g. consider them unverified at first but then mark them as verified if any File under Release can be verified. The main "risk" there is that it'll be complicated to engineer, and might result in a "chimera" render on PyPI, where some metadata comes from file A and some from file B.)
Hmm, I hadn't thought of this. A new
is_activecolumn makes sense to me, or alternatively we could define anis_activehelper that checks whetherlen(projects) != 0. I think the latter would work approximately as well, without requiring a DB migration, although I might have missed something
I don't really love this, as it requires us to make sure we check is_active everywhere we want to use a publisher.
I think I'd prefer storing details on the event instead, which would allow us to persist the details even if the publisher is deleted.
(The alternative here would be to consider the
Releasemodel's URLs mutable/subject to changes in verification state
I think we should probably do that eventually, but it's OK to maintain the status quo for now and only consider the first file published.
Tagging up: with https://github.com/pypi/warehouse/pull/16302, we're now persisting attestations and generating provenance objects for them. That PR also contains the initial API changes, i.e. both the simple index and simple JSON APIs include provenance fields referencing the served provenance for the distribution, if present.
UPD: I merged and released support for attestations in the action.
@woodruffw so https://test.pypi.org/project/tinytext/3.8.1.dev14/#files should have attestations uploaded. How/where are they going to show up in the UI?
How/where are they going to show up in the UI?
They'll show up in the API starting with https://github.com/pypi/warehouse/pull/16546, and from there I'll be working on UI support with @facutuesca and @DarkaMaul 🙂
Oh, does that mean that workflow/project management tools can start implementing distribution verification against it?
Oh, does that mean that workflow/project management tools can start implementing distribution verification against it?
Unfortunately not quite -- we had to revert that last PR due to performance regressions on the simple index + some incorrect assumptions about being able to stream from PyPI's object storage 😅. We're reworking the implementation to use just the DB instead; I'll update this issue with the stack of PRs for that.