kubectl icon indicating copy to clipboard operation
kubectl copied to clipboard

kubectl wait timeout argument is poorly documented and ill-suited to waiting on multiple resources

Open mgabeler-lee-6rs opened this issue 3 years ago • 6 comments
trafficstars

This is just a re-submit of https://github.com/kubernetes/kubectl/issues/754 which, despite being confirmed & assigned, was closed as stale without any fix.

What happened:

Run kubectl wait with a selector matching more than one resource and a timeout

What you expected to happen:

The timeout should apply to the wait command, not to the individual resources.

With the timeout applying to resources sequentially, it makes waiting on more than one resource with any kind of timeout basically unusable.

How to reproduce it (as minimally and precisely as possible):

  1. Create a deployment scaled to 2 or more replicas, and a label that can be used to match it
  2. Run: kubectl wait pod --selector=... --timeout=30s
  3. Observe that this runs for N*30s, where N is the number of pods

Anything else we need to know?:

cc @eranreshef the original reporter and @JabusKotze who assigned the prior issue to themselves

mgabeler-lee-6rs avatar May 25 '22 15:05 mgabeler-lee-6rs

/sig cli

ardaguclu avatar May 26 '22 12:05 ardaguclu

Here is a way to reproduce:

kubectl apply -f - << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: dtest
  name: dtest
spec:
  replicas: 2 
  selector:
    matchLabels:
      app: dtest
  template:
    metadata:
      labels:
        app: dtest
    spec:
      containers:
      - name: bb
        image: busybox
        command: ["/bin/sh", "-c", "sleep infinity"]
EOF

time kubectl wait pod --selector=app=dtest --for=condition=ItWillNeverBeThis --timeout=5s

Output:

timed out waiting for the condition on pods/dtest-56c46b55dd-7tq8r
timed out waiting for the condition on pods/dtest-56c46b55dd-hg7x9

real	0m10.083s
user	0m0.112s
sys	0m0.011s

^ shows the command took 10s (because Replicas=2) when the timeout itself was only supposed to be 5s.

brianpursley avatar Jun 14 '22 17:06 brianpursley

/triage accept

This was discussed on the bug scrub today and we agree that this is not good behavior. To solve this we will need to implement either contexts or goroutines to run these waiters in parallel to more appropriately match the user expectation here.

mpuckett159 avatar Jun 22 '22 21:06 mpuckett159

@mpuckett159: The label(s) triage/accept cannot be applied, because the repository doesn't have them.

In response to this:

/triage accept

This was discussed on the bug scrub today and we agree that this is not good behavior. To solve this we will need to implement either contexts or goroutines to run these waiters in parallel to more appropriately match the user expectation here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 22 '22 21:06 k8s-ci-robot

/triage accepted whoops

mpuckett159 avatar Jun 22 '22 21:06 mpuckett159

Hello,

We have a similar problem, expecting kubectl wait to wait for X seconds in total with "--timeout=Xs", e.g.:

kubectl  wait --for=condition=available --timeout=10m deployment --all

However it waits for X seconds * Number of deployments with not-ready pods. Could you please consider also our scenario in the fix?

Kind Regards,

Vitaly

vitalyrychkov avatar Sep 09 '22 14:09 vitalyrychkov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 08 '22 14:12 k8s-triage-robot

/remove-lifecycle stale

If noone is willing to take it, I can work on that.

ardaguclu avatar Dec 08 '22 15:12 ardaguclu

/assign

ardaguclu avatar Dec 16 '22 10:12 ardaguclu

Workaround for those using kubectl or oc before v1.27 You can use timeout before the kubectl wait or oc wait. For example with a timeout of max. 305s (the timeout of the timeout command should be a little larger then the timeout of the kubectl command):

timeout $((300+5)) kubectl wait --for=condition=Ready --all pod --timeout=300s

R-Studio avatar Jul 20 '23 07:07 R-Studio