helm icon indicating copy to clipboard operation
helm copied to clipboard

Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Open felipecrs opened this issue 2 years ago • 45 comments

Hello, I'm receiving this error when installing one of my charts:

Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

I'm afraid I cannot share the chart itself as it's an internal chart, but searching through the issues in helm I could not find any direct matches. Most of them says context deadline exceed, like these:

  • https://github.com/helm/helm/issues/9761
  • https://github.com/helm/helm/issues/7997

But none says would exceed context deadline.

Is there any debug tips someone can share? I don't even know where to begin.

Things I tried already:

  • Building Helm with this PR: https://github.com/helm/helm/pull/10715
  • Setting HELM_BURST_LIMIT=200
  • Building Helm with this PR: https://github.com/felipecrs/helm/pull/2
  • (new) Removing --cleanup-on-fail

None changes the result. With debug logs enabled:

ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

The Helm command line:

helm upgrade --install my-chart ./my-chart --wait --wait-for-jobs --timeout 1800s --cleanup-on-fail --create-namespace --namespace default --values my-values.yaml --reset-values --history-max 10 --debug

Output of helm version:

version.BuildInfo{Version:"v3.12.1", GitCommit:"f32a527a060157990e2aa86bf45010dfb3cc8b8d", GitTreeState:"clean", GoVersion:"go1.20.5"}

Output of kubectl version:

Client Version: v1.24.12
Kustomize Version: v4.5.4
Server Version: v1.24.12+k3s1

Cloud Provider/Platform (AKS, GKE, Minikube etc.): K3S

felipecrs avatar Jun 16 '23 22:06 felipecrs

I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.

E.g. 6 minutes timeout:

helm install/upgrade --timeout 360

@felipecrs I'm not sure, but I think you need to remove the s at the end of the timeout. (I have not tested this)

ghost avatar Jun 26 '23 05:06 ghost

This error looks to have come from Kubernetes / client-go: https://github.com/kubernetes/client-go/blob/9186f40b189c640d4188387901905c728f291f17/rest/request.go#L615

Given the function name there, I think (guess) that you're being throttled / API requests are being rate limited. The machinary is detecting that the determined delay to wait to come under API limits would exceed the context's deadline.

gjenkins8 avatar Jun 26 '23 15:06 gjenkins8

It looks like client-go changed "recently" to propagatedecorate this error: https://github.com/kubernetes/client-go/commit/147848c452865ee870eafa8eba223849c40e791c

So perhaps the error message here is new(ish). Rather than the error cause.

gjenkins8 avatar Jun 26 '23 15:06 gjenkins8

I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.

E.g. 6 minutes timeout:

helm install/upgrade --timeout 360

@felipecrs I'm not sure, but I think you need to remove the s at the end of the timeout. (I have not tested this)

Thanks for the inputs, but:

  • Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.
  • --timeout with s suffix is fine :)

felipecrs avatar Jun 26 '23 15:06 felipecrs

(though note there isn't a return statement in the error code block code there)

image

felipecrs avatar Jun 26 '23 15:06 felipecrs

(though note there isn't a return statement in the error code block code there)

Thanks, yeah, I noticed also after posting. And updated my post hoping to front any views (to avoid more confusion)

gjenkins8 avatar Jun 26 '23 22:06 gjenkins8

Does the --debug flag produce any additional helpful output?

Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.

I'm not sure. It very much seems the code producing this error is the throttle code for to rate limiting Kubernetes API access. It seems like the calculated delay would need to be >20mins given those numbers for this error to appear.

gjenkins8 avatar Jun 26 '23 22:06 gjenkins8

Does the --debug flag produce any additional helpful output?

Unfortunately no:

With debug logs enabled:

ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

The lines above these logs are just a bunch more Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready.

felipecrs avatar Jun 27 '23 00:06 felipecrs

ready.go:393: [debug] StatefulSet is not ready: default/my-statefulset. 1 out of 2 expected pods are ready
Error: INSTALLATION FAILED: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
INSTALLATION FAILED
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:147
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:968
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598

felipecrs avatar Jul 03 '23 17:07 felipecrs

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] avatar Nov 03 '23 00:11 github-actions[bot]

My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.

Any opinions?

felipecrs avatar Nov 04 '23 16:11 felipecrs

This error may occur due to various reasons, not only related to CPU throttling. It could be triggered by a CrashLoopBackoff in a pod while Helm upgrade is waiting for a successful deployment. Alternatively, it might result from a failure in a database initialization job, among other possibilities.

My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.

Any opinions?

Yes, I would agree with that conclusion; it actually looks like a deployment timeout. The current error message is indeed confusing, and I support the suggestion to have Helm catch this error and translate it into a standard timeout message to improve clarity for users.

giuliohome avatar Dec 04 '23 14:12 giuliohome

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] avatar Mar 04 '24 00:03 github-actions[bot]

Still valid.

felipecrs avatar Mar 04 '24 00:03 felipecrs

My final conclusion about this issue is that it is equivalent to a regular helm timeout.

Can you please quickly summarize why this is the case? (For someone who has lost track of the exact details here 😓, and for future readers of this issue)

gjenkins8 avatar Mar 04 '24 03:03 gjenkins8

I think simplest way to come to this conclusion is to check how much time helm took to fail with this error. It should take the same as you specified as --wait.

felipecrs avatar Mar 04 '24 04:03 felipecrs

Facing a similar issue: client rate limiter Wait returned an error: context deadline exceeded. I started facing this when I added a pod anti affinity rule to my helm chart.

suyash0103 avatar May 13 '24 17:05 suyash0103

getting the same error. Still valid

sanan-wow avatar May 29 '24 23:05 sanan-wow

getting exact same error. Still valid.

SohailLS avatar Jun 20 '24 12:06 SohailLS

Same issue

jbatmalle avatar Jul 26 '24 09:07 jbatmalle

Still valid

stevec-dubber avatar Aug 05 '24 01:08 stevec-dubber

still valid

dinesh-rajaveeran avatar Aug 20 '24 14:08 dinesh-rajaveeran

+1 also experiencing this

guipace avatar Aug 26 '24 13:08 guipace

+1 also experiencing this

kebab-mai-haddi avatar Aug 26 '24 17:08 kebab-mai-haddi

still valid

michael-a-antinucci avatar Aug 31 '24 20:08 michael-a-antinucci

This also suddenly started affecting us today (deploying to AKS 1.29.6 with Helm 3.14.4 running on Bitbucket Pipelines). It does match Helm --timeout for us. Not breaking behaviour once you find out the cause, but given that Helm has its own error upon reaching the timeout that is a lot clearer to the user, this should be fixed to actually display that error.

EndymionWight avatar Sep 04 '24 04:09 EndymionWight

still valid

stevec-dubber avatar Sep 22 '24 23:09 stevec-dubber

+1 still a valid issue.

teshsharma avatar Oct 17 '24 17:10 teshsharma

+1 And still valid

jrwhetse avatar Oct 17 '24 23:10 jrwhetse

+1 still valid

MaroIT avatar Nov 02 '24 12:11 MaroIT