calico
calico copied to clipboard
Calico broken route on master node when multi nic
Context :
We use two nic on our server, one (ens192) for the admin traffic [ssh] and an other one for production (ens224) (apps exposition, monitoring data, dns, etc.. etc..). Iptables rules are already in place to autorize specific ip to reach the ens192 over SSH, and autorize specific traffic on ens224 (for exemple output traffic like : dns, ntp, ldap, apps, etc..).
We have some specific dynamic routing on server (to avoid asymetric routing) : all traffic coming through ens192 should goback to ens192 and same for ens224. For that we have in place 2 policy rules based on table lookup: 500 => ens224
ip route show table 500
default via 192.168.143.250 dev ens224 proto static src 192.168.143.53 metric 100
192.168.143.0/24 via 192.168.143.53 dev ens224 proto static src 192.168.143.53 metric 99
192.168.143.250 dev ens224 proto static scope link src 192.168.143.53 metric 100
600 => ens192
ip route show table 600
default via 10.252.144.250 dev ens192 proto static src 10.252.144.23 metric 101
10.252.144.250 dev ens192 proto static scope link src 10.252.144.23 metric 101
The main default route is over ens224 :
ip route show
default via 192.168.143.250 dev ens224 proto static metric 100
Ip rules list :
ip rule list
0: from all lookup local
500: from 192.168.143.53 lookup stickyinterfaceprod proto static
600: from 10.252.144.23 lookup stickyinterfaceadmin proto static
32766: from all lookup main
32767: from all lookup default
Expected Behavior
Calico should respect routing over ens224
Current Behavior
Calico create following route :
blackhole 172.80.6.0/26 proto 80 172.80.6.1 dev calib670fd5cc65 scope link 172.80.6.2 dev caliba0d39000ce scope link 172.80.6.3 dev calibbe91b2e23c scope link 172.80.6.4 dev caliba9ebb5cc2a scope link
ip route
default via 192.168.143.250 dev ens224 proto static metric 100
10.252.144.0/24 dev ens192 proto kernel scope link src 10.252.144.23 metric 101
blackhole 172.80.6.0/26 proto 80
172.80.6.1 dev calib670fd5cc65 scope link
172.80.6.2 dev caliba0d39000ce scope link
172.80.6.3 dev calibbe91b2e23c scope link
172.80.6.4 dev caliba9ebb5cc2a scope link
192.168.143.0/24 dev ens224 proto kernel scope link src 192.168.143.53 metric 100
Once calico route are created route looks like broken which make routing to ens192 (10.252.X):
ip route get 172.80.6.2
172.80.6.2 dev caliba0d39000ce src 10.252.144.23 uid 0
cache
Calico are installed once the cluster is init following step :
-
kubeadm init --apiserver-advertise-address=192.168.143.53 --pod-network-cidr=172.80.0.0/21 --control-plane-endpoint=192.168.143.53:6443
-
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
-
kubectl apply -f custom-resources.yaml
custom-resources.yaml contain :
# This section includes base Calico installation configuration.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
ipPools:
- name: default-ipv4-ippool
blockSize: 26
cidr: 172.80.0.0/21
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
---
# This section configures the Calico API server.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
---
kind: Installation
apiVersion: operator.tigera.io/v1
metadata:
name: default
spec:
calicoNetwork:
nodeAddressAutodetectionV4:
interface: ens224
I was thinking about an issue of the IPv4 detection its why i've added the "interface: ens224" few minutes ago .. but nothing change.
All pods with ip 172.80.6.X are not in READY status :
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-system calico-kube-controllers-5455db6fb-b6pvp 0/1 CrashLoopBackOff 8 (4m20s ago) 29m 172.80.6.4 dev-test-frm-k8s-master01-v <none> <none>
calico-system calico-node-skpvg 1/1 Running 0 29m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
calico-system calico-typha-54d99cfb88-jccf9 1/1 Running 0 29m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
calico-system csi-node-driver-2th45 2/2 Running 0 29m 172.80.6.3 dev-test-frm-k8s-master01-v <none> <none>
kube-system coredns-7c65d6cfc9-khj8h 0/1 CrashLoopBackOff 9 (2m53s ago) 42m 172.80.6.2 dev-test-frm-k8s-master01-v <none> <none>
kube-system coredns-7c65d6cfc9-vt8vb 0/1 CrashLoopBackOff 9 (2m41s ago) 42m 172.80.6.1 dev-test-frm-k8s-master01-v <none> <none>
kube-system etcd-dev-test-frm-k8s-master01-v 1/1 Running 1 42m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
kube-system kube-apiserver-dev-test-frm-k8s-master01-v 1/1 Running 0 42m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
kube-system kube-controller-manager-dev-test-frm-k8s-master01-v 1/1 Running 0 42m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
kube-system kube-proxy-g7dk7 1/1 Running 0 42m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
kube-system kube-scheduler-dev-test-frm-k8s-master01-v 1/1 Running 0 42m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
tigera-operator tigera-operator-89c775547-rshk9 1/1 Running 0 29m 192.168.143.53 dev-test-frm-k8s-master01-v <none> <none>
And I can see on kernel logs some martian warning when pods trying to communicate to master ip node :
Sep 30 11:45:01 DEV-TEST-FRM-K8S-MASTER01-V kernel: IPv4: martian source 192.168.143.53 from 172.80.6.1, on dev calib670fd5cc65
Possible Solution
I don't know
Your Environment
- Calico version => 3.28.2
- Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes v1.31.0
- Operating System and version: Rocky Linux 8.10 - 4.18.0-553.16.1.el8_10.x86_64 # 1 SMP Thu Aug 8 17:47:08 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Someone can help me to debug this behavior ? I'm staying at disposal for any questions or other