garden icon indicating copy to clipboard operation
garden copied to clipboard

[FEATURE]: More informative errors when things timeout because of lack of resources

Open eysi09 opened this issue 2 years ago • 3 comments

Feature Request

Background / Motivation

Sometimes Garden commands timeout because a given Pod can't be scheduled due to lack of cluster resources.

These type of errors are pretty hard to debug unless you look at the cluster while the Pod is still being scheduled. This is in particular opaque for CI runs.

I'd love for Garden to tell me that a given command timed out because of this reason (i.e. 0/X nodes available).

🌹 It’s a nice to have, but nice things are nice 🙂

eysi09 avatar May 25 '22 16:05 eysi09

Usually pods will get pending status in k8s if there's a lack of resources.

I could test around and try to see if there are easy improvements that could be made around this, but I'd need to set up a k8s cluster (probably local) that can be pushed to it's limits for ease of testing.

Orzelius avatar May 25 '22 17:05 Orzelius

It should be fairly easy to reproduce by setting really high resource requests on a given workload.

Garden will timeout on most operations so I suppose the idea would be add some context to the timeout if we can. But there's a lot of ways things can timeout.

I'd suggest starting with build timeouts and see if we can tell the user if things timeout due to any of the following:

  • Garden couldn't deploy Buildkit/Kaniko
  • The deployment registry wasn't reachable
  • The builder became unresponsive

Not super hi-pri though, but certainly a nice to have.

eysi09 avatar Jun 05 '22 14:06 eysi09

This issue has been automatically marked as stale because it hasn't had any activity in 90 days. It will be closed in 14 days if no further activity occurs (e.g. changing labels, comments, commits, etc.). Please feel free to tag a maintainer and ask them to remove the label if you think it doesn't apply. Thank you for submitting this issue and helping make Garden a better product!

stale[bot] avatar Sep 21 '22 03:09 stale[bot]

This issue has been automatically marked as stale because it hasn't had any activity in 90 days. It will be closed in 14 days if no further activity occurs (e.g. changing labels, comments, commits, etc.). Please feel free to tag a maintainer and ask them to remove the label if you think it doesn't apply. Thank you for submitting this issue and helping make Garden a better product!

stale[bot] avatar Jan 07 '23 20:01 stale[bot]