kubectl icon indicating copy to clipboard operation
kubectl copied to clipboard

`kubectl wait` for un-existed resource.

Open kvokka opened this issue 5 years ago • 54 comments

What happened:

I can not avoid exit with error for uncreated resource, which is misleading by the command name:

kubectl wait --selector=foo=bar --for=condition=complete jobs kubectl wait --for=condition=complete jobs/foo

Even if this behavior if intentional the user should have an option to continue waiting instead of exit with error code.

In the contrast, if the resource already exists everything works as it should.

What you expected to happen:

Kubectl should at least to have the ability (option) to wait un-existed resource.

Anything else we need to know?:

Connect kubernetes/kubernetes#75227

Environment:

  • Kubernetes version (use kubectl version): 1.15.2
  • OS (e.g: cat /etc/os-release): mac Os 10.14.6
  • Kernel (e.g. uname -a): Darwin Kernel Version 18.7.0

kvokka avatar Sep 27 '19 12:09 kvokka

/sig cli

kvokka avatar Sep 27 '19 12:09 kvokka

Can take a look into this.

/assign @rikatz

rikatz avatar Sep 27 '19 22:09 rikatz

@kvokka just a question: Deleted conditions are also something you're looking for?

I imagine a situation where you wan't a deleted condition for an object that wasn't even created. This seems pretty strange to me :) but let me know if this is also a scenario.

Tks

rikatz avatar Sep 30 '19 13:09 rikatz

@rikatz Thank you for your response!

For me simple wait until timeout is more than enough. If the developer want to control the object persistence/deletion just let him do it. Sounds reasonable?

The example scenario of the expected behaviour is described in this article.

kvokka avatar Sep 30 '19 13:09 kvokka

Right.

It might be pretty trickier than I thought it was, but I'm already taking a look.

The biggest problem is that the function used to 'visit' an object expects it to exists (ResourceFinder.Do().Visit) so I'm taking a look to check if it's possible to 'bypass/loop' into it ;)

rikatz avatar Sep 30 '19 13:09 rikatz

I've made an initial and dirty PR just to see if this is the path to follow :D

rikatz avatar Sep 30 '19 20:09 rikatz

Thank you for the contribution! Will hope the code will be reviewed/merged soon! :)

kvokka avatar Oct 01 '19 04:10 kvokka

OK, so I need some review :/ The dumbest way is what I did...A sleep with 1s. Not sure why ResourceFinder is used here and if something more "flexible" could be used, so need someone with more experience in Kubernetes Code to review that.

rikatz avatar Oct 02 '19 20:10 rikatz

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Dec 31 '19 21:12 fejta-bot

/remove-lifecycle stale

Got some time to resolve some other stuff, but this is still a thing

rikatz avatar Jan 02 '20 20:01 rikatz

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Apr 01 '20 21:04 fejta-bot

/remove-lifecycle stale

jemc avatar Apr 01 '20 21:04 jemc

Had the same problem, I am using an script to create all my k8s objects, then wait for a particular pod to be ready.

I have a race condition, when the wait executes the object apparently is not yet created.

This issue force me to sleep for a few seconds before issuing the wait.

kubectl apply -f foo.yaml
sleep 5 # Just to avoid the wait err below
kubectl -n ns wait pod --for=condition=ready -l name=pod --timeout=120s

lopezator avatar Jun 05 '20 09:06 lopezator

Would it make sense to have another command which polls for a pod simply to exist? Then wait could optionally call that command if a knob is provided and allow it to 'share' its timeout. That sleep is a real bummer :)

Another issue is that if you're searching for a set of pods with a particular label, then wait declares success when the first pod with that label is up and ready - if the second one hasn't been created yet by the time the first one is ready.

Here's an example. Here both example-connectivity-domain and example-connectivity-domain-wcmd-deployment have the label sought in the condition. Success is declared although the latter is still being created. Otoh, I don't want it waiting until the timeout in case additional pods are still coming. Maybe a matching resource count would be useful? If the count is supplied then keep polling for the whole period or until all $count resources exist and match the condition?

04-Aug [21:18:53.875] + kubectl --kubeconfig=/home/app-net-jenkins/kubeconfigs/central/kind-1.kubeconfig apply -f helm-template-example-connectivity-domain 04-Aug [21:19:00.542] connectivitydomain.cnns.cisco.com/example-connectivity-domain created 04-Aug [21:19:00.542] + sleep 5 04-Aug [21:19:04.808] + echo 'Waiting for pod to be Ready' 04-Aug [21:19:04.808] Waiting for pod to be Ready 04-Aug [21:19:04.808] + kubectl wait --kubeconfig=/home/app-net-jenkins/kubeconfigs/central/kind-1.kubeconfig --timeout=300s --for condition=Ready -l connectivitydomain=example-connectivity-domain -n=default pod 04-Aug [21:19:23.176] pod/example-connectivity-domain-64fcc4cc86-7jt2h condition met 04-Aug [21:19:23.176] + kubectl get pods -A --kubeconfig=/home/app-net-jenkins/kubeconfigs/central/kind-1.kubeconfig 04-Aug [21:19:23.176] NAMESPACE NAME READY STATUS RESTARTS AGE 04-Aug [21:19:23.176] default connectivity-domain-operator-56dbb5977c-mh5dp 1/1 Running 0 10m 04-Aug [21:19:23.176] default etcd-operator-84cf6bc5d5-kf5dc 1/1 Running 0 12m 04-Aug [21:19:23.176] default example-connectivity-domain-64fcc4cc86-7jt2h 1/1 Running 0 20s 04-Aug [21:19:23.176] default example-connectivity-domain-wcmd-deployment-6847bcff77-stvr2 0/1 ContainerCreating 0 9s

sfph avatar Aug 05 '20 17:08 sfph

FYI that I've hit this issue for a node type as well:

+ kubectl --kubeconfig /tmp/targetkubeconfig wait --for=condition=Ready node --all --timeout 900s
error: no matching resources found

It appears to have been a race condition for me as well (I was looping on the command above, waiting for the apiserver to become ready). On retesting it worked fine.

mattmceuen avatar Aug 06 '20 19:08 mattmceuen

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Nov 04 '20 19:11 fejta-bot

/remove-lifecycle stale

sfph avatar Nov 04 '20 20:11 sfph

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Feb 02 '21 21:02 fejta-bot

/remove-lifecycle stale

jakerobb avatar Feb 02 '21 22:02 jakerobb

Here's the thing I use in the meantime:

while : ; do
  kubectl get [your thing] && break
  sleep 5
done

Adjust the sleep duration if you like. Redirect stdout/stderr if you like. And, of course, add a timeout if you like. This is normally followed by waiting for a condition on the thing that we now know exists.

jakerobb avatar Feb 02 '21 22:02 jakerobb

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 03 '21 22:05 fejta-bot

/remove-lifecycle stale

jemc avatar May 04 '21 18:05 jemc

@jakerobb kubectl get ... && break will not work because kubectl will exit 0 when no resources are found.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:19:55Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get pod
No resources found in default namespace.
$ echo $?
0

jhoblitt avatar Jul 14 '21 17:07 jhoblitt

@jakerobb kubectl get ... && break will not work because kubectl will exit 0 when no resources are found.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:19:55Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get pod
No resources found in default namespace.
$ echo $?
0

It will exit with 1 if you specify a resource name:

❯ kubectl get pod
No resources found in default namespace.

~
❯ echo $?
0

~
❯ kubectl get pod sdas
Error from server (NotFound): pods "sdas" not found

~
❯ echo $?
1

gandazgul avatar Jul 14 '21 21:07 gandazgul

@gandazgul What is your kubectl version?

$ kubectl get pod which-does-not-exist
Error from server (NotFound): pods "which-does-not-exist" not found
$ echo $?
1

jhoblitt avatar Jul 15 '21 17:07 jhoblitt

I wanted to drop out a use case here that is a little different that the other ones mentioned. I am using OPA/gatekeeper to do validation on k8s resources both at build time and runtime. The way gatekeeper works is that you specify constraint templates and then define implementations of those constraint templates in unique kinds. The problem here is that when you define a constraint template, it creates a new CRD instance that the implementing constraint then uses.

I need to wait for all the constraint template CRD kinds to be available in the cluster as they are reconciled by the controller. This is an external dependency inside the cluster that I don't have control over except for the fact that they are loading CRDs to the API server. I've implemented a while loop --wait method, but I would love to see a wait --for exists with a timeout. This is explicitly a different use case than wait --for condition because you expect that the object is available to be able to satisfy the condition.

jmcshane avatar Jul 16 '21 12:07 jmcshane

@gandazgul What is your kubectl version?

$ kubectl get pod which-does-not-exist
Error from server (NotFound): pods "which-does-not-exist" not found
$ echo $?
1

1.18 for my post example but I tried with 1.20 and same behavior. Get pods exists with 0 even if no pods vs get pod name-doesnt-exist will exit with 1 which I think makes sense.

I was responding to @jakerobb about how to do a while loop to wait for the pod to exist.

gandazgul avatar Jul 19 '21 17:07 gandazgul

I wanted to drop out a use case here that is a little different that the other ones mentioned. I am using OPA/gatekeeper to do validation on k8s resources both at build time and runtime. The way gatekeeper works is that you specify constraint templates and then define implementations of those constraint templates in unique kinds. The problem here is that when you define a constraint template, it creates a new CRD instance that the implementing constraint then uses.

I need to wait for all the constraint template CRD kinds to be available in the cluster as they are reconciled by the controller. This is an external dependency inside the cluster that I don't have control over except for the fact that they are loading CRDs to the API server. I've implemented a while loop --wait method, but I would love to see a wait --for exists with a timeout. This is explicitly a different use case than wait --for condition because you expect that the object is available to be able to satisfy the condition.

We have the same exact use case (OPA Gatekeeper). We are also applying it via terraform so a bash/wait loop is not a viable alternative for us.

Additionally, we configure Cilium with Dataplane V2 on GKE (not specifically important, but what is:) that, at some point, creates a NetworkLogging resource that I need to patch/update. But I can't patch it until it's been created by the networking controller.

rwkarg avatar Aug 06 '21 00:08 rwkarg

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 04 '21 01:11 k8s-triage-robot

/remove-lifecycle stale

jemc avatar Nov 04 '21 15:11 jemc