gh-action-pypi-publish
gh-action-pypi-publish copied to clipboard
[TODO] Explore handling HTTP errors on Rektor flakiness
@woodruffw @facutuesca we recently saw an HTTP 502 and a traceback in the attestations flow:
Traceback (most recent call last):
File "/root/.local/lib/python3.12/site-packages/sigstore/_internal/rekor/client.py", line 160, in post
resp.raise_for_status()
File "/root/.local/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://rekor.sigstore.dev/api/v1/log/entries/
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/attestations.py", line 149, in <module>
main()
File "/app/attestations.py", line 145, in main
attest_dist(dist_path, attestation_path, signer)
File "/app/attestations.py", line 114, in attest_dist
attestation = Attestation.sign(signer, dist)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/lib/python3.12/site-packages/pypi_attestations/_impl.py", line 200, in sign
bundle = signer.sign_dsse(stmt)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/lib/python3.12/site-packages/sigstore/sign.py", line 230, in sign_dsse
return self._finalize_sign(cert, content, proposed_entry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/lib/python3.12/site-packages/sigstore/sign.py", line 189, in _finalize_sign
entry = self._signing_ctx._rekor.log.entries.post(proposed_entry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/lib/python3.12/site-packages/sigstore/_internal/rekor/client.py", line 162, in post
raise RekorClientError(http_error)
sigstore._internal.rekor.client.RekorClientError: Rekor returned an unknown error with HTTP 502
(https://github.com/aio-libs/aiohttp/actions/runs/15359675323/job/43225662768#step:9:384)
Mind taking a look?
Maybe, this should be fixed in the sigstore lib. Not sure. If not, I suspect Twine would need some handling as well.
Yeah, it looks like there was some kind of Rekor hiccup/short outage over the weekend -- pinging @haydentherapper since he might know more 🙂
Looks like there are two things sigstore-python could do better here:
- We could probably produce a more explanatory error here (both for API and CLI users)
- We should probably have some kind of retry handling for Rekor API calls, although in this case that probably wouldn't have helped much
yeah, that's exactly what I was thinking
Yea, there was a brief outage over the weekend, still investigating root cause.
I just wanted to chime in and say this bug has been affecting me for a couple hours, so I had to revert to setting attestations: false in my GitHub Actions workflow. Any idea how frequently this occurs?
We're having an outage at the moment, this appears to be due to our cloud provider, not Rekor itself. We'll update as there's more information.
I see, that's interesting because my original action had an identical stack with the Rekor error: https://github.com/we3lab/pype-schema/actions/runs/16766504059/job/47472565963
And with attestations: false it works: https://github.com/we3lab/pype-schema/actions/runs/16766590355
Figured I would share in case it helps debug.
GCP's issue appears to have been resolved, requests are now succeeding.
Looks like there was another outage yesterday: #376 / #377. I wonder what can we do about it..
cc @woodruffw @haydentherapper
Yeah, this one looks even more concerning to me: it looks like both Rekor and Fulcio were failing for a period. I've raised it on the Sigstore slack.