pulumi-kubernetes
pulumi-kubernetes copied to clipboard
Kubernetes client rate-limiting
What happened?
I was running the kubernetes provider in a debugger, and attaching to it using PULUMI_DEBUG_PROVIDERS. I used the same process for numerous deployments, and eventually the provider transitioned to a failure state, apparently due to client-side rate limiting. Once I restarted the provider process, the problem was fixed.
I decided to file an issue because, though my specific case is exotic, there might be a deeper scalability problem in the provider related to rate-limiting in the kube client. See https://github.com/kubernetes/kubernetes/issues/111880 for more background.
Diagnostics:
kubernetes:apps/v1:Deployment (deployment):
error: update of resource "urn:pulumi:dev::issue-xyz::kubernetes:apps/v1:Deployment::deployment" failed
because the Kubernetes API server reported that it failed to fully initialize or become live:
client rate limiter Wait returned an error: context canceled
pulumi:pulumi:Stack (issue-xyz-dev):
error: update failed
Here's the update made just prior to the first rate-limit error. I'd deliberately used an invalid image nginxfoo.
Diagnostics:
kubernetes:apps/v1:Deployment (deployment):
warning: Refreshed resource is in an unhealthy state:
* Resource 'mydeployment' was created but failed to initialize
* Minimum number of Pods to consider the application live was not attained
* [Pod eron/mydeployment-65df56c569-dnqzh]: containers with unready status: [nginx]
error: update of resource "urn:pulumi:dev::issue-2455::kubernetes:apps/v1:Deployment::deployment" failed because the Kubernetes API server reported that it failed to fully initialize or become live: Resource operation was cancelled for "mydeployment"
Example
name: issue-2942
runtime: yaml
description: A minimal Kubernetes Pulumi YAML program
config:
pulumi:tags:
value:
pulumi:template: kubernetes-yaml
outputs:
name: ${deployment.metadata.name}
resources:
deployment:
properties:
metadata:
name: mydeployment
spec:
replicas: 1
selector:
matchLabels: ${appLabels}
template:
metadata:
labels: ${appLabels}
spec:
containers:
- image: nginx
name: nginx
env:
- name: DEMO_GREETING
value: "16"
type: kubernetes:apps/v1:Deployment
variables:
appLabels:
app: nginx
N/A
Output of pulumi about
CLI
Version 3.108.1
Go Version go1.22.0
Go Compiler gc
Plugins
NAME VERSION
kubernetes unknown
yaml unknown
Host
OS darwin
Version 14.4.1
Arch arm64
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Here's what happened in my case: the provider was sent a Cancel RPC, causing the provider's internal context to be canceled. In subsequent requests, the kube client logic is the first to hit upon the cancelled context.
Two possible follow-ups:
- double-check the qps settings
- teach the provider to reset the cancelation signal when it receives
ConfigureRPC.
The low-level throttling code is here:
https://github.com/kubernetes/client-go/blob/46588f2726fa3e25b1704d6418190f424f95a990/rest/request.go#L986-L991
Is there another alternative where we generously bump the QPS ceiling if running under debug? A quick workaround like that might be prudent if this is impacting the debug loop but not end-users.
Related https://github.com/pulumi/pulumi-kubernetes/pull/1748