gitpod icon indicating copy to clipboard operation
gitpod copied to clipboard

Unexpected error loading prebuild

Open aledbf opened this issue 3 years ago • 3 comments

Bug description

rpc error: code = FailedPrecondition desc = cannot initialize workspace: prebuild initializer: Git fallback: git initializer gitClone: mkdir /dst/spring-petclinic: no such file or directory

Jaeger-UI (4)

Workspace affected

gitpodio-springpetclini-8a38a5a57eu

Expected behavior

  1. Log when this happens (we don't now).
  2. Ideally we'd wait longer until the file is ready, before trying to clone

Example repository

None

Anything else?

How long do we wait before the file system is ready?

aledbf avatar Aug 03 '22 20:08 aledbf

We already have log entry for this: https://github.com/gitpod-io/gitpod/blob/6390f2064394ebccf788f4ff5b57fd66e0313ce1/components/content-service/pkg/initializer/git.go#L75

Also looking at logs, it looks like workspace failed, and then 5 minutes later ws-daemon tried to run initializer for it (???): image

I cannot quite understand what exactly happened to that workspace lifecycle.

sagor999 avatar Aug 04 '22 22:08 sagor999

Thanks for looking at this one, @sagor999 ! @jenting prior to resuming PVC work, could you peek at this to see what you can find? It's the last of broken windows we found from gen59 traces.

kylos101 avatar Aug 08 '22 00:08 kylos101

  1. Log when this happens (we didn't for now).

We log it already

  1. Ideally we'd wait longer until the file is ready, before trying to clone

We haven't run git clone; the error reports on os.MkdirAll(ws.Location, 0775) failed 🤔


I thought the error happens at this line https://github.com/gitpod-io/gitpod/blob/3d97d5552ec092938327c3813d53a04038c3db7f/components/content-service/pkg/initializer/git.go#L75

However, I did not find the span isGitWS in the tracing https://github.com/gitpod-io/gitpod/blob/3d97d5552ec092938327c3813d53a04038c3db7f/components/content-service/pkg/initializer/git.go#L64 🤔

jenting avatar Aug 08 '22 10:08 jenting

I'm blocked for this issue now.

I can't reproduce it locally to make the os.MkdirAll(ws.Location, 0775) failed with the reason no such file or directory. Is this because the mount point /dst/ is not ready yet?

jenting avatar Aug 11 '22 02:08 jenting

:wave: @jenting were you able to find anything meaningful via Google searches, or recreate similar misbehavior in https://go.dev/play/? I ask so that we can have that context when sharing this issue with a teammate next week.

For now, let's leave blocked. And later this week I'll inspect frequency for this error. Frequency will determine if we reassign to another teammate while you're out on vacation, etc.

kylos101 avatar Aug 11 '22 14:08 kylos101

👋 @jenting were you able to find anything meaningful via Google searches, or recreate similar misbehavior in https://go.dev/play/? I ask so that we can have that context when sharing this issue with a teammate next week.

I did some google searching and write a similar code locally to reproduce the error. But no luck so far.

jenting avatar Aug 11 '22 14:08 jenting

This is odd. From the log, the error comes from here. However, I can't see this line warning log within GCP log.

Note: we filter by instanceId="a7ad0fe1-3ebc-4786-a207-366f8c7c1e47"

jenting avatar Aug 12 '22 08:08 jenting

@jenting are you still blocked and need help from the team (if yes please reach out in #t_workspace), or, do you have more info to go on now because of this thread?

kylos101 avatar Aug 12 '22 13:08 kylos101

If this PR doesn't fix this problem, we have to write code to check if the container is still alive. https://github.com/gitpod-io/gitpod/pull/12215

utam0k avatar Aug 19 '22 06:08 utam0k

This is related to this: https://github.com/gitpod-io/gitpod/issues/12282 If StopWorkspace was called while workspace was still doing content init, then it may fail with this exact error, as ws-daemon does not know that workspace was stopped and /dst has disappeared.

sagor999 avatar Aug 23 '22 00:08 sagor999

This is related to this: #12282 If StopWorkspace was called while workspace was still doing content init, then it may fail with this exact error, as ws-daemon does not know that workspace was stopped and /dst has disappeared.

I just wonder who deletes $wsRoot/dst? kubelet or our component? Do you know?

utam0k avatar Aug 23 '22 00:08 utam0k

Could be that our housekeeping job in ws-daemon does that? :thinking:

sagor999 avatar Aug 23 '22 00:08 sagor999

I just wonder who deletes $wsRoot/dst? kubelet or our component? Do you know?

I have the same question.

Since we are not sure whether #12282 addressed this issue or not, we might need to consider reopening this issue.

jenting avatar Aug 23 '22 01:08 jenting

There are probably two patterns in this issue

  • Doing Initializing after the stop request https://github.com/gitpod-io/gitpod/issues/12282
  • For some reason, the initializeing starts very late.

I put the log links of those two patterns in this PR. Perhaps this PR will improve both, but it is unclear if they will be resolved. https://github.com/gitpod-io/gitpod/pull/12215

So, I think if it happens again, we should reopen.

Perhaps this PR will improve both, but it is unclear if they will be resolved.

utam0k avatar Aug 23 '22 01:08 utam0k

We need to check the jaeger tracing on the gen63 cluster to see if it still happens or not.

jenting avatar Aug 25 '22 08:08 jenting

It still happens 😭 https://cloudlogging.app.goo.gl/4Hk68KGGpKBS1wyk9

utam0k avatar Aug 29 '22 00:08 utam0k

FYI, we need webapp to add logging, so that we can know "why" stop workspace is being called via https://github.com/gitpod-io/gitpod/issues/12282. Once that is done, then we can proceed with this particular issue.

kylos101 avatar Sep 06 '22 19:09 kylos101

Added Blocked label, because we're waiting for webapp to schedule and do logging in https://github.com/gitpod-io/gitpod/issues/12282

kylos101 avatar Sep 08 '22 19:09 kylos101

This is no longer blocked as of https://github.com/gitpod-io/gitpod/issues/12283

kylos101 avatar Sep 20 '22 14:09 kylos101

During the refinement meeting, @utam0k mentioned he saw the error recently. We don't know exactly how to approach other than looking at the new logs and trying to understand what's going on. There's currently no hypothesis.

atduarte avatar Sep 27 '22 07:09 atduarte

@jenting @utam0k As this is Scheduled, and not In-progress, I removed you both from assigned. This way, it is "free" for later, when someone has bandwidth, then it can be assigned and status changed accordingly. :smile: Have a nice day you two! :wave:

kylos101 avatar Oct 04 '22 02:10 kylos101

@kylos101 :100: Thanks

utam0k avatar Oct 04 '22 02:10 utam0k

@sagor999 could you peek at the new logs, to see why the workspaces are stopping, to help form a plan of attack for this? I'm going to move this from Scheduled to the Inbox for now.

kylos101 avatar Oct 12 '22 02:10 kylos101

Hm. I looked in traces (US) and looked in GCP logs for that error and cannot find one. :thinking:

sagor999 avatar Oct 12 '22 21:10 sagor999

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 16 '23 06:01 stale[bot]

image

There are a few error messages so I closed

utam0k avatar Jan 17 '23 00:01 utam0k