kubectl
kubectl copied to clipboard
kubectl wait timeout argument is poorly documented and ill-suited to waiting on multiple resources
This is just a re-submit of https://github.com/kubernetes/kubectl/issues/754 which, despite being confirmed & assigned, was closed as stale without any fix.
What happened:
Run kubectl wait with a selector matching more than one resource and a timeout
What you expected to happen:
The timeout should apply to the wait command, not to the individual resources.
With the timeout applying to resources sequentially, it makes waiting on more than one resource with any kind of timeout basically unusable.
How to reproduce it (as minimally and precisely as possible):
- Create a deployment scaled to 2 or more replicas, and a label that can be used to match it
- Run:
kubectl wait pod --selector=... --timeout=30s - Observe that this runs for N*30s, where N is the number of pods
Anything else we need to know?:
cc @eranreshef the original reporter and @JabusKotze who assigned the prior issue to themselves
/sig cli
Here is a way to reproduce:
kubectl apply -f - << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dtest
name: dtest
spec:
replicas: 2
selector:
matchLabels:
app: dtest
template:
metadata:
labels:
app: dtest
spec:
containers:
- name: bb
image: busybox
command: ["/bin/sh", "-c", "sleep infinity"]
EOF
time kubectl wait pod --selector=app=dtest --for=condition=ItWillNeverBeThis --timeout=5s
Output:
timed out waiting for the condition on pods/dtest-56c46b55dd-7tq8r
timed out waiting for the condition on pods/dtest-56c46b55dd-hg7x9
real 0m10.083s
user 0m0.112s
sys 0m0.011s
^ shows the command took 10s (because Replicas=2) when the timeout itself was only supposed to be 5s.
/triage accept
This was discussed on the bug scrub today and we agree that this is not good behavior. To solve this we will need to implement either contexts or goroutines to run these waiters in parallel to more appropriately match the user expectation here.
@mpuckett159: The label(s) triage/accept cannot be applied, because the repository doesn't have them.
In response to this:
/triage accept
This was discussed on the bug scrub today and we agree that this is not good behavior. To solve this we will need to implement either contexts or goroutines to run these waiters in parallel to more appropriately match the user expectation here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/triage accepted whoops
Hello,
We have a similar problem, expecting kubectl wait to wait for X seconds in total with "--timeout=Xs", e.g.:
kubectl wait --for=condition=available --timeout=10m deployment --all
However it waits for X seconds * Number of deployments with not-ready pods. Could you please consider also our scenario in the fix?
Kind Regards,
Vitaly
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
If noone is willing to take it, I can work on that.
/assign
Workaround for those using kubectl or oc before v1.27
You can use timeout before the kubectl wait or oc wait. For example with a timeout of max. 305s (the timeout of the timeout command should be a little larger then the timeout of the kubectl command):
timeout $((300+5)) kubectl wait --for=condition=Ready --all pod --timeout=300s