calico icon indicating copy to clipboard operation
calico copied to clipboard

calico pods report an error of `no route to host`

Open weironz opened this issue 1 year ago • 5 comments

Expected Behavior

When configure the CPU irqaffinity in /etc/default/grub,the calico pods to run normally.

Current Behavior

When configure the CPU irqaffinity in /etc/default/grub,the calico pods crashloopback and report an error of no route to host.

what changed and calico apiserver logs image

calico kube-controller logs image

Possible Solution

Removing the kernel parameter CPU irqaffinity, calico will restore normal operation, but we need this parameter for CPU isolation to improve performance.

Steps to Reproduce (for bugs)

  1. install kubernetes and calico with kubeadm,cluster and calico is running
  2. config irqaffinity=0,10 kernel options.
  3. reboot kubernetes node
  4. calico pods crashloopback and report an error of no route to host
root@node31:~# cat /etc/default/grub | grep GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="irqaffinity=0,6 noirqbalance intel_iommu=on iommu=pt"
root@node31:~# 
root@node31:~# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-5.15.0-102-generic root=/dev/mapper/ubuntu--vg-lv--0 ro irqaffinity=0,10 noirqbalance intel_iommu=on iommu=pt

and i also use reservedSystemCPUs in kubelet config for system progress

root@node31:~# cat /var/lib/kubelet/config.yaml  |grep -i cpu
cpuCFSQuota: true
cpuCFSQuotaPeriod: 100ms
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 10s
  cpu: 500m
reservedSystemCPUs: 0,10-19
  cpu: 500m

Context

I need to isolate a portion of the exclusive CPU for the VPP application, so I use irqaffinity to concentrate CPU interrupts on other CPUs, eg 0 10.

Your Environment

  • Calico version: v3.26.1, install use helm with calico operator.
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes with kubeadm, v1.25.11, containterd,just one master node.
  • Operating System and version: ubuntu 22.04
  • Link to your project (optional):

weironz avatar Apr 26 '24 01:04 weironz

i have same question Warning FailedCreatePodSandBox 114m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3d6733531c3caf74141893557e9d697f5ba909a3a08874087ed6892142153048" network for pod "calico-kube-controllers-7b84757b95-576fg": networkPlugin cni failed to set up pod "calico-kube-controllers-7b84757b95-576fg_kube-system" network: plugin type="calico" failed (add): error creating calico client: stat /etc/cni/net.d/calico-kubeconfig: no such file or directory

Warning Unhealthy 113m (x7 over 114m) kubelet Readiness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: connect: no route to host

fs2016l avatar Apr 27 '24 13:04 fs2016l

When I shut down the firewall, the error disappeared, but I need to use it while the firewall is running

fs2016l avatar Apr 27 '24 13:04 fs2016l

@willzhang do you use calico VPP?

tomastigera avatar Apr 29 '24 23:04 tomastigera

@willzhang do you use calico VPP?

no, just calico ipip with helm install.

weironz avatar Apr 30 '24 01:04 weironz

@willzhang could you provide any logs from the failing pods? Why they are failing? It does not seem obvious why irq affinity would have such an effect, but perhaps some misconfiguration of network devices? Are you using some overlay? Are queues on the overlay assigned properly? I think vxlan.calico has a single queue only.

tomastigera avatar May 07 '24 16:05 tomastigera

Solved the problem by reinstalling the OS system.

weironz avatar Sep 11 '24 01:09 weironz