gitpod icon indicating copy to clipboard operation
gitpod copied to clipboard

workspace pod never reached Running state: cannot schedule pod due to out of resources, reason: OutOfcpu

Open jenting opened this issue 2 years ago • 1 comments

Bug description

From the tracing, the us60 cluster reports

workspace pod never reached Running state: cannot schedule pod due to out of resources, reason: OutOfcpu

which we did not see at previous clusters.

Descript pod

status:
  message: 'Pod Node didn''t have enough resource: cpu, requested: 2000, used: 15190,
    capacity: 15900'
  phase: Failed
  reason: OutOfcpu

Image

Steps to reproduce

We don't know for now.

Workspace affected

No response

Expected behavior

No response

Example repository

No response

Anything else?

I think this is because we changed 1m to whole numbers, which request entire cores: [1][2][3][4]

jenting avatar Aug 10 '22 01:08 jenting

@Furisto assigning to you and marking in-progress as it pertains to gen60 and workspace-classes.

kylos101 avatar Aug 10 '22 02:08 kylos101

Maybe a reoccurrence of https://github.com/kubernetes/kubernetes/issues/106884

Furisto avatar Aug 10 '22 12:08 Furisto

I think this is because we changed 1m to whole numbers, which request entire cores: [1][2][3][4]

These values affect the limits, not the requests.

Furisto avatar Aug 10 '22 12:08 Furisto

@Furisto my apologies, I was mistaken and linked the wrong values. Doh!

kylos101 avatar Aug 10 '22 13:08 kylos101

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 09 '22 07:11 stale[bot]

We still hit this one on gen70

utam0k avatar Dec 15 '22 06:12 utam0k

gcp log

utam0k avatar Dec 15 '22 06:12 utam0k

Related codes https://github.com/gitpod-io/gitpod/blob/478a75e744a642d9b764de37cfae655bc8b29dd5/components/ws-manager/pkg/manager/manager.go#L372-L399

I think we have to hit this line https://github.com/gitpod-io/gitpod/blob/478a75e744a642d9b764de37cfae655bc8b29dd5/components/ws-manager/pkg/manager/manager.go#L375

But this error type was probably not wait.ErrWaitTimeout, so the retry code could not be reached. https://github.com/gitpod-io/gitpod/blob/478a75e744a642d9b764de37cfae655bc8b29dd5/components/ws-manager/pkg/manager/manager.go#L395

utam0k avatar Dec 15 '22 06:12 utam0k