helm Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Hello, I'm receiving this error when installing one of my charts:

Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

I'm afraid I cannot share the chart itself as it's an internal chart, but searching through the issues in helm I could not find any direct matches. Most of them says context deadline exceed, like these:

https://github.com/helm/helm/issues/9761
https://github.com/helm/helm/issues/7997

But none says would exceed context deadline.

Is there any debug tips someone can share? I don't even know where to begin.

Things I tried already:

Building Helm with this PR: https://github.com/helm/helm/pull/10715
Setting HELM_BURST_LIMIT=200
Building Helm with this PR: https://github.com/felipecrs/helm/pull/2
(new) Removing --cleanup-on-fail

None changes the result. With debug logs enabled:

ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

The Helm command line:

helm upgrade --install my-chart ./my-chart --wait --wait-for-jobs --timeout 1800s --cleanup-on-fail --create-namespace --namespace default --values my-values.yaml --reset-values --history-max 10 --debug

Output of helm version:

version.BuildInfo{Version:"v3.12.1", GitCommit:"f32a527a060157990e2aa86bf45010dfb3cc8b8d", GitTreeState:"clean", GoVersion:"go1.20.5"}

Output of kubectl version:

Client Version: v1.24.12
Kustomize Version: v4.5.4
Server Version: v1.24.12+k3s1

Cloud Provider/Platform (AKS, GKE, Minikube etc.): K3S

Jun 16 '23 22:06 felipecrs

I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.

E.g. 6 minutes timeout:

helm install/upgrade --timeout 360

@felipecrs I'm not sure, but I think you need to remove the s at the end of the timeout. (I have not tested this)

Jun 26 '23 05:06 ghost

This error looks to have come from Kubernetes / client-go: https://github.com/kubernetes/client-go/blob/9186f40b189c640d4188387901905c728f291f17/rest/request.go#L615

Given the function name there, I think (guess) that you're being throttled / API requests are being rate limited. The machinary is detecting that the determined delay to wait to come under API limits would exceed the context's deadline.

Jun 26 '23 15:06 gjenkins8

It looks like client-go changed "recently" to ~~propagate~~decorate this error: https://github.com/kubernetes/client-go/commit/147848c452865ee870eafa8eba223849c40e791c

So perhaps the error message here is new(ish). Rather than the error cause.

Jun 26 '23 15:06 gjenkins8

I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.

E.g. 6 minutes timeout:
helm install/upgrade --timeout 360
@felipecrs I'm not sure, but I think you need to remove the s at the end of the timeout. (I have not tested this)

Thanks for the inputs, but:

Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.
--timeout with s suffix is fine :)

Jun 26 '23 15:06 felipecrs

(though note there isn't a return statement in the error code block code there)

Jun 26 '23 15:06 felipecrs

(though note there isn't a return statement in the error code block code there)

Thanks, yeah, I noticed also after posting. And updated my post hoping to front any views (to avoid more confusion)

Jun 26 '23 22:06 gjenkins8

Does the --debug flag produce any additional helpful output?

Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.

I'm not sure. It very much seems the code producing this error is the throttle code for to rate limiting Kubernetes API access. It seems like the calculated delay would need to be >20mins given those numbers for this error to appear.

Jun 26 '23 22:06 gjenkins8

Does the --debug flag produce any additional helpful output?

Unfortunately no:

With debug logs enabled:

ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

The lines above these logs are just a bunch more Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready.

Jun 27 '23 00:06 felipecrs

ready.go:393: [debug] StatefulSet is not ready: default/my-statefulset. 1 out of 2 expected pods are ready
Error: INSTALLATION FAILED: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
INSTALLATION FAILED
main.newInstallCmd.func2
        helm.sh/helm/v3/cmd/helm/install.go:147
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:968
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598

Jul 03 '23 17:07 felipecrs

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

Nov 03 '23 00:11 github-actions[bot]

My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.

Any opinions?

Nov 04 '23 16:11 felipecrs

This error may occur due to various reasons, not only related to CPU throttling. It could be triggered by a CrashLoopBackoff in a pod while Helm upgrade is waiting for a successful deployment. Alternatively, it might result from a failure in a database initialization job, among other possibilities.

My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.

Any opinions?

Yes, I would agree with that conclusion; it actually looks like a deployment timeout. The current error message is indeed confusing, and I support the suggestion to have Helm catch this error and translate it into a standard timeout message to improve clarity for users.

Dec 04 '23 14:12 giuliohome

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

Mar 04 '24 00:03 github-actions[bot]

Still valid.

Mar 04 '24 00:03 felipecrs

My final conclusion about this issue is that it is equivalent to a regular helm timeout.

Can you please quickly summarize why this is the case? (For someone who has lost track of the exact details here 😓, and for future readers of this issue)

Mar 04 '24 03:03 gjenkins8

I think simplest way to come to this conclusion is to check how much time helm took to fail with this error. It should take the same as you specified as --wait.

Mar 04 '24 04:03 felipecrs

Facing a similar issue: client rate limiter Wait returned an error: context deadline exceeded. I started facing this when I added a pod anti affinity rule to my helm chart.

May 13 '24 17:05 suyash0103

getting the same error. Still valid

May 29 '24 23:05 sanan-wow

getting exact same error. Still valid.

Jun 20 '24 12:06 SohailLS

Same issue

Jul 26 '24 09:07 jbatmalle

Still valid

Aug 05 '24 01:08 stevec-dubber

still valid

Aug 20 '24 14:08 dinesh-rajaveeran

+1 also experiencing this

Aug 26 '24 13:08 guipace

+1 also experiencing this

Aug 26 '24 17:08 kebab-mai-haddi

still valid

Aug 31 '24 20:08 michael-a-antinucci

This also suddenly started affecting us today (deploying to AKS 1.29.6 with Helm 3.14.4 running on Bitbucket Pipelines). It does match Helm --timeout for us. Not breaking behaviour once you find out the cause, but given that Helm has its own error upon reaching the timeout that is a lot clearer to the user, this should be fixed to actually display that error.

Sep 04 '24 04:09 EndymionWight

still valid

Sep 22 '24 23:09 stevec-dubber

+1 still a valid issue.

Oct 17 '24 17:10 teshsharma

+1 And still valid

Oct 17 '24 23:10 jrwhetse

+1 still valid

Nov 02 '24 12:11 MaroIT