gitpod
gitpod copied to clipboard
registry-facade fails to authenticate against Google Artifact Registry
Bug description
When using Google Artifact Registry as image-build registry - or as airgap mirror, workspace starts fail after some time of inactivity. That's because registry-facade can no longer authenticate against GAR properly. Restarting registry-facade resolves the issue.
In this case, registry-facade reports:
{
"@type": "type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent",
"error": "httpReadSeeker: failed open: unexpected status code https://europe-docker.pkg.dev/v2/some-project/some-registry/gitpod/supervisor/blobs/sha256:6bd5243cce7ba86c48d29f5b75f93d11e57b5d69a49ecaddc8b27a8d6e6e5d1d: 401 Unauthorized - Server message: unauthorized: not authenticated: No valid credential was supplied.",
"level": "error",
"message": "cannot get blob",
"serviceContext": {
"service": "registry-facade",
"version": "commit-51980e6c1f8a5352f7f7c66957674f89c1e36c58"
},
"severity": "ERROR",
"time": "2022-07-13T10:39:07Z"
}
Steps to reproduce
Don't use registry-facade for a while
Workspace affected
No response
Expected behavior
No response
Example repository
No response
Anything else?
No response
Thank you for looking at this, @utam0k ! FYI, assuming we are able to solve the problem, when done, we'll want to share with the self-hosted team, so they can make a judgement about whether to include as a hot fix, or wait to include in the next release.
cc: @gitpod-io/engineering-self-hosted
Fwiw, for me this sounds like it's worth including in a hotfix :)
I have not reproduced this issue yet, but I found the docs about the authentication of Docker Registry v2.
expires_in (Optional) The duration in seconds since the token was issued that it will remain valid. When omitted, this defaults to 60 seconds. For compatibility with older clients, a token should never be returned with less than 60 seconds to live. https://docs.docker.com/registry/spec/auth/token/
We've historically found with GAR that, like GCR, you need to include both the URL and the server address (docs). Have you configured it like that?
@MrSimonEmms Thanks for your help. I followed this instruction. https://github.com/gitpod-io/gitpod/pull/10266
We need more info here. How was auth setup for GAR here? Was it a token or service account? As @utam0k mentioned, we cannot repro this issue so far.
According to affected customer, using service account. So it is odd indeed that it loses authentication after some X amount of hours. :thinking: Could it be that GAR does something extra here and requires you to relogin after X amount of hours?
We are waiting on customer to provide a bit more info. Also some more info posted in this issue as well: https://github.com/gitpod-io/customers/issues/71
@sagor999 @utam0k let's leave this in in-progress while waiting for feedback. 🙏 In general, we shouldn't move things backwards to Breakdown or Scheduled...unless of course we found an issue we closed needs to be reopened because it is happening again in production.
@utam0k I added blocked label, removed Pavel as assignee (thank you for your feedback @sagor999 ), and added a note on the project to indicate we're waiting on customer feedback. Please leave in in-progress for now, and refer scheduled groundwork column. 🙏 We won't be able to resume this till September (we're waiting for customer feedback).
I have created the snapshot of this preview env and delete the preview env to save money
Removing the related high priority for now, and reached out to @julia-leyton for help.
@julia-leyton I am going to close this issue for now, we could not recreate it. If the customer is able to recreate and share a related support bundle, let us know? Happy to reopen.