gitpod
gitpod copied to clipboard
[ws-daemon] cannot find workspace during WaitForInit
Bug description
Steps to reproduce
Workspace affected
No response
Expected behavior
Log the error when its encountered by the service (instead of just returning an error), not just when consumers encounter it. Include instance ID as part of the log message, so we can have that context.
Example repository
No response
Anything else?
This is just the first step for now, get some insight to whether this condition is happening, with related context, so we can react later.
@jenting 👋 hey bud, I'm not sure if you have an open PR for this, even draft is okay, can you link it to this issue?
@jenting 👋 hey bud, I'm not sure if you have an open PR for this, even draft is okay, can you link it to this issue?
Nope, haven't found the root cause. Unassign myself.
We did a special handling within ws-manager to handle the gRPC not found error.
https://github.com/gitpod-io/gitpod/blob/e40e43d76120e5de702522e6b816f28b86a219c6/components/ws-manager/pkg/manager/monitor.go#L684-L698
Looked trough every instance of this in the last 12 hours and > 90% of workspaces recover, because we retry the initialization as pointed out by @jenting above.
Still in us60
:
Still in us63
One possible cause of this is this: https://github.com/gitpod-io/gitpod/issues/12357
@sagor999 assigning you because https://github.com/gitpod-io/gitpod/pull/12360 is not deployed yet
We still see this in the us64
cluster.
Moved back to breakdown, since we're still seeing in us64
, we should talk about a strategy in refinement on how to proceed, and update the issue description, prior to moving this back to scheduled.
This is not an error any more.
This is simply a by product of how gRPC logs its errors in tracing.
When we call /wsdaemon.WorkspaceContentService/WaitForInit
we need to return a correct error: NotFound
This is normal behaviour.
Unfortunately that gets logged as error, even though NotFound is not considered error here, since we already disposed workspace. It is handled correctly by finalizeWorkspaceContent
. And finalizeWorkspaceContent
might get called multiple times, since we will do that every time pod sees any update to its state.
I guess once we switch to wsman Mk2, we will be able to store state in its own CRD object, instead of storing it on the pod.
@sagor999 thank you for reopening! I removed the PR from the Workspace project, as the issue is already there and In-Progress.
Thank you @jenting for linking this issue and PR! :smile: