gitpod [ws-daemon] cannot find workspace during WaitForInit

Bug description

Screenshot from 2022-07-27 22-04-19

Steps to reproduce

Workspace affected

No response

Expected behavior

Log the error when its encountered by the service (instead of just returning an error), not just when consumers encounter it. Include instance ID as part of the log message, so we can have that context.

Example repository

No response

Anything else?

This is just the first step for now, get some insight to whether this condition is happening, with related context, so we can react later.

Jul 28 '22 02:07 aledbf

@jenting 👋 hey bud, I'm not sure if you have an open PR for this, even draft is okay, can you link it to this issue?

Aug 02 '22 14:08 kylos101

@jenting 👋 hey bud, I'm not sure if you have an open PR for this, even draft is okay, can you link it to this issue?

Nope, haven't found the root cause. Unassign myself.

Aug 03 '22 01:08 jenting

We did a special handling within ws-manager to handle the gRPC not found error.

https://github.com/gitpod-io/gitpod/blob/e40e43d76120e5de702522e6b816f28b86a219c6/components/ws-manager/pkg/manager/monitor.go#L684-L698

Aug 03 '22 07:08 jenting

Looked trough every instance of this in the last 12 hours and > 90% of workspaces recover, because we retry the initialization as pointed out by @jenting above.

Aug 03 '22 12:08 Furisto

Still in us60:

Aug 10 '22 01:08 kylos101

Still in us63

Aug 24 '22 13:08 jenting

One possible cause of this is this: https://github.com/gitpod-io/gitpod/issues/12357

Aug 24 '22 19:08 sagor999

@sagor999 assigning you because https://github.com/gitpod-io/gitpod/pull/12360 is not deployed yet

Aug 26 '22 16:08 kylos101

We still see this in the us64 cluster.

Sep 07 '22 07:09 jenting

Moved back to breakdown, since we're still seeing in us64, we should talk about a strategy in refinement on how to proceed, and update the issue description, prior to moving this back to scheduled.

Sep 09 '22 03:09 kylos101

This is not an error any more. This is simply a by product of how gRPC logs its errors in tracing. When we call /wsdaemon.WorkspaceContentService/WaitForInit we need to return a correct error: NotFound This is normal behaviour. Unfortunately that gets logged as error, even though NotFound is not considered error here, since we already disposed workspace. It is handled correctly by finalizeWorkspaceContent. And finalizeWorkspaceContent might get called multiple times, since we will do that every time pod sees any update to its state. I guess once we switch to wsman Mk2, we will be able to store state in its own CRD object, instead of storing it on the pod.

Sep 12 '22 21:09 sagor999

@sagor999 thank you for reopening! I removed the PR from the Workspace project, as the issue is already there and In-Progress.

Oct 04 '22 05:10 kylos101

Thank you @jenting for linking this issue and PR! :smile:

Oct 04 '22 05:10 kylos101

gitpod gitpod copied to clipboard

[ws-daemon] cannot find workspace during WaitForInit

Bug description

Steps to reproduce

Workspace affected

Expected behavior

Example repository

Anything else?

gitpod
gitpod copied to clipboard