gh-action-pypi-publish
gh-action-pypi-publish copied to clipboard
Adaptive short-lived token expiration handling
So it happened that a short-lived token in yarl started expiring before all uploads managed to complete.
This resulted in having to yank two releases: https://pypi.org/project/yarl/1.18.2/#history
The last failure started uploading things but didn't complete in 15 minutes, eventually hitting the token expiration: https://github.com/aio-libs/yarl/actions/runs/12091091291/job/33719944175#step:9:729
The last successful upload had the step with this action running for 4 minutes https://github.com/aio-libs/yarl/actions/runs/11954909206/job/33329106474#step:9:768
@woodruffw I don't know what exactly caused the slowness, but I think that maybe we should re-request that token every couple of dists if the cumulative size of the uploads is big. I suppose that adding the attestations doubles the number of uploads, which might also affect the overall upload time.
Huh, this is interesting -- FWICT all of yarl's distributions appear to be pretty small (all are under 1MB?), so I can't imagine why it would take more than a minute or two to upload them. Maybe GHA itself had a network hiccup?
(The attestations add to that size, but each is ~5KB so that should mostly be a rounding error.)
Re-requesting the token when the cumulative size is large makes sense to me; another option would be to do it on a timer, i.e. re-request if any given dist takes more than ~7 minutes to upload. I'll think a bit about how to rearchitect things here to enable that (maybe calling twine multiple times rather than one-shot?)
Yeah, I was thinking about calling twine multiple times and doing the OIDC dance before each call. Since Twine doesn't have a public API, we can't really integrate more granularly...
True -- this might also be the right time for me to revisit https://github.com/pypa/twine/issues/999, since the technical blockers for that on PyPI's side are now resolved. Having the OIDC exchange baked directly into twine could potentially make it easier to refresh the token automatically.
Maybe… However, with the release responsiveness there, I envision that causing more problems.
Maybe… However, with the release responsiveness there, I envision that causing more problems.
I'm one of the maintainers now, so it should be a bit faster 😅 -- but I can appreciate not wanting it to become a blocker. We already have all of the code here in the action already, so I can certainly start by looking at things directly here.
I just hit this in Matplotlib: https://github.com/matplotlib/matplotlib/actions/runs/12095099184/job/33728735029 I don't know why upload is so slow; when I was uploading them myself before Trusted Publishing became a thing, it took half the time for all of them.
There are 4 wheels (for the lesser-used PyPy, at least) and the sdist missing. Is there anything to be done other than yanking the release? From the log, it appears that the action generated the attestations before starting the upload, but I don't know if there's some component that is on the runner that needs to be uploaded to PyPI as well, or whether I can just upload the wheels myself anyway.
I'm not sure how many people are relying on attestations since this is the first release to have them. The files are also attested on GitHub since we ran that step before the PyPI upload (though I know people have suggested to reverse these operations.)
From the log, it appears that the action generated the attestations before starting the upload, but I don't know if there's some component that is on the runner that needs to be uploaded to PyPI as well, or whether I can just upload the wheels myself anyway.
You can upload the rest manually, but they won't have attestations. That means your release will only be partially attested, however.
Another option would be to re-run your CI for the same release, but pass skip-existing: true. That should gracefully handle the files that have already been uploaded and include attestations for the remaining files.
(I'm sorry for the pain here -- I'm still not 100% sure why CI jobs are suddenly exceeding the 15 minute token window. I'll be able to look into it more today.)
You can upload the rest manually, but they won't have attestations. That means your release will only be partially attested, however.
OK, I've done so, and will just note the missing annotations in the release announcement.
Note when I first tried, I got a 502 Bad Request error with twine 4.0.2. When I realized twine was outdated, I installed the newest version and tried again. This time uploading did work, but there was a pause of about 10-15 seconds after each one, and one of them had a 502 response, which twine claimed to retry. I then re-ran it and all 5 missing items claimed to upload again. Checking on PyPI they really are there now.
Anyway, I'm not trying to report a bug on twine or warehouse here, but just that perhaps the delay after each file uploaded and the sporadic 502 Bad Request responses may be contributing to slow uploads.
Thanks @QuLogic, that's helpful context. This does sound like it might in fact be a PyPI backend issue -- twine shouldn't really be getting 502s if the package/upload is otherwise well formed.
Out of curiosity, do you know what metadata version you're producing? I'm wondering if the recent metadata 2.3/2.4 problems with twine might also be related here.
No, I believe it's version 2.1.
@woodruffw FYI whatever was causing slow uploads to PyPI, stopped causing problems on Sunday and a few releases have been successful within aio-libs.
Though, I wouldn't abandon the idea of per-dist token refresh, so the issue remains open.
Our latest release also worked today: https://github.com/matplotlib/matplotlib/actions/runs/12309652164/job/34358661026
Current status:
- [x] supporting PyPI API is in place: https://github.com/pypi/warehouse/issues/18235
- [x] Twine PR implementing this merged: https://github.com/pypa/twine/issues/1246 / https://github.com/pypa/twine/pull/1249
- [ ] Twine release pending
- [ ] Will merge #360 once there's a Twine version to bump to