Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
Hello, I'm receiving this error when installing one of my charts:
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
I'm afraid I cannot share the chart itself as it's an internal chart, but searching through the issues in helm I could not find any direct matches. Most of them says context deadline exceed, like these:
- https://github.com/helm/helm/issues/9761
- https://github.com/helm/helm/issues/7997
But none says would exceed context deadline.
Is there any debug tips someone can share? I don't even know where to begin.
Things I tried already:
- Building Helm with this PR: https://github.com/helm/helm/pull/10715
- Setting
HELM_BURST_LIMIT=200 - Building Helm with this PR: https://github.com/felipecrs/helm/pull/2
- (new) Removing --cleanup-on-fail
None changes the result. With debug logs enabled:
ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready
Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
The Helm command line:
helm upgrade --install my-chart ./my-chart --wait --wait-for-jobs --timeout 1800s --cleanup-on-fail --create-namespace --namespace default --values my-values.yaml --reset-values --history-max 10 --debug
Output of helm version:
version.BuildInfo{Version:"v3.12.1", GitCommit:"f32a527a060157990e2aa86bf45010dfb3cc8b8d", GitTreeState:"clean", GoVersion:"go1.20.5"}
Output of kubectl version:
Client Version: v1.24.12
Kustomize Version: v4.5.4
Server Version: v1.24.12+k3s1
Cloud Provider/Platform (AKS, GKE, Minikube etc.): K3S
I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.
E.g. 6 minutes timeout:
helm install/upgrade --timeout 360
@felipecrs I'm not sure, but I think you need to remove the s at the end of the timeout. (I have not tested this)
This error looks to have come from Kubernetes / client-go: https://github.com/kubernetes/client-go/blob/9186f40b189c640d4188387901905c728f291f17/rest/request.go#L615
Given the function name there, I think (guess) that you're being throttled / API requests are being rate limited. The machinary is detecting that the determined delay to wait to come under API limits would exceed the context's deadline.
It looks like client-go changed "recently" to propagatedecorate this error: https://github.com/kubernetes/client-go/commit/147848c452865ee870eafa8eba223849c40e791c
So perhaps the error message here is new(ish). Rather than the error cause.
I stumble upon the same error message. After trying back and forth several times, I realized that the timeout had been reached. Increasing the timeout helped me.
E.g. 6 minutes timeout:
helm install/upgrade --timeout 360@felipecrs I'm not sure, but I think you need to remove the
sat the end of the timeout. (I have not tested this)
Thanks for the inputs, but:
- Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.
--timeoutwithssuffix is fine :)
(though note there isn't a return statement in the error code block code there)
(though note there isn't a return statement in the error code block code there)
Thanks, yeah, I noticed also after posting. And updated my post hoping to front any views (to avoid more confusion)
Does the --debug flag produce any additional helpful output?
Raising the timeout doesn't make any difference for me. My timeout is already 1800s (30 minutes), and even when bumping it to 2400s (40 minutes), my install/upgrade command errors at ~20 minutes.
I'm not sure. It very much seems the code producing this error is the throttle code for to rate limiting Kubernetes API access. It seems like the calculated delay would need to be >20mins given those numbers for this error to appear.
Does the
--debugflag produce any additional helpful output?
Unfortunately no:
With debug logs enabled:
ready.go:287: [debug] Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
The lines above these logs are just a bunch more Deployment is not ready: default/my-pod. 0 out of 1 expected pods are ready.
ready.go:393: [debug] StatefulSet is not ready: default/my-statefulset. 1 out of 2 expected pods are ready
Error: INSTALLATION FAILED: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
helm.go:84: [debug] client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:147
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:1044
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:968
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:250
runtime.goexit
runtime/asm_amd64.s:1598
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.
Any opinions?
This error may occur due to various reasons, not only related to CPU throttling. It could be triggered by a CrashLoopBackoff in a pod while Helm upgrade is waiting for a successful deployment. Alternatively, it might result from a failure in a database initialization job, among other possibilities.
My final conclusion about this issue is that it is equivalent to a regular helm timeout. I would recommend helm to catch this error and translate it into a regular timeout error message, to avoid confusion for future users.
Any opinions?
Yes, I would agree with that conclusion; it actually looks like a deployment timeout. The current error message is indeed confusing, and I support the suggestion to have Helm catch this error and translate it into a standard timeout message to improve clarity for users.
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.
Still valid.
My final conclusion about this issue is that it is equivalent to a regular helm timeout.
Can you please quickly summarize why this is the case? (For someone who has lost track of the exact details here 😓, and for future readers of this issue)
I think simplest way to come to this conclusion is to check how much time helm took to fail with this error. It should take the same as you specified as --wait.
Facing a similar issue: client rate limiter Wait returned an error: context deadline exceeded. I started facing this when I added a pod anti affinity rule to my helm chart.
getting the same error. Still valid
getting exact same error. Still valid.
Same issue
Still valid
still valid
+1 also experiencing this
+1 also experiencing this
still valid
This also suddenly started affecting us today (deploying to AKS 1.29.6 with Helm 3.14.4 running on Bitbucket Pipelines). It does match Helm --timeout for us. Not breaking behaviour once you find out the cause, but given that Helm has its own error upon reaching the timeout that is a lot clearer to the user, this should be fixed to actually display that error.
still valid
+1 still a valid issue.
+1 And still valid
+1 still valid