cilium-cli icon indicating copy to clipboard operation
cilium-cli copied to clipboard

cilium connectivity test may take a long time to fail out if it can't deploy pods

Open joestringer opened this issue 4 years ago • 2 comments

If you have a situation where pods cannot be deployed, cilium connectivity test appears to hang:

$ ./cilium connectivity test
ℹ️  Single node environment detected, enabling single node connectivity test
✨ [microk8s-cluster] Creating namespace for connectivity check...
✨ [microk8s-cluster] Deploying echo-same-node service...
✨ [microk8s-cluster] Deploying client service...
⌛ [microk8s-cluster] Waiting for deployments [client echo-same-node] to become ready...

Eventually it does time out:

Error: Connectivity test failed:  waiting for deployment client to become ready has been interrupted: context deadline exceeded

The above would be due to the long timeout here:

https://github.com/cilium/cilium-cli/blob/e9bc3cfdbbc40855a4bf0b2bab61123808c6bbf9/connectivity/check.go#L665

In the mean time, I can see that there are reasons why the pods are not being deployed:

$ k -n  cilium-test describe pod echo-same-node-97cd54966-bqz2v | tail -n 4
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  13s (x7 over 4m51s)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
  Warning  FailedScheduling  3s                   default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity.

It'd be convenient if the CLI was able to figure this kind of case out a bit more quickly and give you a hint why nothing's moving, particularly in cases where deployment is making no progress as could be otherwise observed through the above. This could be something as simple as setting a timer for 30s and printing a message "Try running kubectl -n cilium-test get pods to see whether the test is making progress" (and cancelling that timer when the deployment wait is successful).

joestringer avatar Jan 28 '21 23:01 joestringer

I met the same issue, and what's worse is that no pods are created kubectl get events -n cilium-test shows no output kubectl -n cilium-test get pods shows no output kubectl -n cilium-test describe deployment client shows no error

dreamerlzl avatar Aug 18 '22 14:08 dreamerlzl

I met the same issue, but with more conditions: running rke2 with cis-1.23 enabled, when running command, it shows

W0731 21:45:02.604008 1684936 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true), hostPort (container "echo-external-node" uses hostPort 8080), allowPrivilegeEscalation != false (container "echo-external-node" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "echo-external-node" must set securityContext.capabilities.drop=["ALL"]; container "echo-external-node" must not include "NET_RAW" in securityContext.capabilities.add), runAsNonRoot != true (pod or container "echo-external-node" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "echo-external-node" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

soulwhisper avatar Aug 01 '23 01:08 soulwhisper