cilium-cli icon indicating copy to clipboard operation
cilium-cli copied to clipboard

Connectivity tests pods may be unschedulable in a two nodes cluster

Open giorio94 opened this issue 1 year ago • 2 comments

Hit on a CI run: https://github.com/cilium/cilium-cli/actions/runs/7259322671/job/19826588396?pr=2194

The test cluster has four nodes, but Cilium is deployed on only two, while the other two are reserved for extra tests. Which is equivalent to a two nodes cluster.

Pods:

NAMESPACE            NAME                                                  READY   STATUS    RESTARTS   AGE     IP             NODE                          
test-namespace       client-5f8f776644-pxh7p                               0/1     Pending   0          5m1s    <none>         <none>                        
test-namespace       client2-868c49bf66-48rfv                              0/1     Pending   0          5m1s    <none>         <none>                        
test-namespace       client3-674cf46fd5-cf9d2                              1/1     Running   0          5m2s    10.244.0.89    chart-testing-control-plane   
test-namespace       echo-external-node-6c447f645f-xwx4l                   1/1     Running   0          5m1s    172.18.0.4     chart-testing-worker3         
test-namespace       echo-other-node-757cbff7b6-pq9bv                      2/2     Running   0          5m2s    10.244.3.46    chart-testing-worker          
test-namespace       echo-same-node-6cc6494564-sxngz                       0/2     Pending   0          5m1s    <none>         <none>                        

Specifically:

  • The echo-other-node pod is scheduled on chart-testing-worker, with a required anti-affinity targeting the client pod;
  • The client3 pod is scheduled on chart-testing-control-plane, with a required anti-affinity targeting the client pod;

Which makes it impossible to schedule the client pod, as both ready nodes are forbidden by the anti-affinity rules:

0/4 nodes are available: 2 node(s) didn''t match pod affinity/anti-affinity,
2 node(s) didn''t satisfy existing pods anti-affinity rules, 2 node(s) had
taint {node.kubernetes.io/not-ready: }, that the pod didn''t tolerate.

And in turn the client2 and echo-same-node pods are also unschedulable, because of the required affinity targeting the client pod.

The `client3 pod got recently introduced in https://github.com/cilium/cilium-cli/pull/2183.

giorio94 avatar Dec 21 '23 08:12 giorio94