calico icon indicating copy to clipboard operation
calico copied to clipboard

Calico broken route on master node when multi nic

Open yakhatape opened this issue 4 months ago • 1 comments

Context :

We use two nic on our server, one (ens192) for the admin traffic [ssh] and an other one for production (ens224) (apps exposition, monitoring data, dns, etc.. etc..). Iptables rules are already in place to autorize specific ip to reach the ens192 over SSH, and autorize specific traffic on ens224 (for exemple output traffic like : dns, ntp, ldap, apps, etc..).

We have some specific dynamic routing on server (to avoid asymetric routing) : all traffic coming through ens192 should goback to ens192 and same for ens224. For that we have in place 2 policy rules based on table lookup: 500 => ens224

ip route show table 500
default via 192.168.143.250 dev ens224 proto static src 192.168.143.53 metric 100
192.168.143.0/24 via 192.168.143.53 dev ens224 proto static src 192.168.143.53 metric 99
192.168.143.250 dev ens224 proto static scope link src 192.168.143.53 metric 100

600 => ens192

ip route show table 600
default via 10.252.144.250 dev ens192 proto static src 10.252.144.23 metric 101
10.252.144.250 dev ens192 proto static scope link src 10.252.144.23 metric 101

The main default route is over ens224 :

ip route show
default via 192.168.143.250 dev ens224 proto static metric 100

Ip rules list :

ip rule list
0:      from all lookup local
500:    from 192.168.143.53 lookup stickyinterfaceprod proto static
600:    from 10.252.144.23 lookup stickyinterfaceadmin proto static
32766:  from all lookup main
32767:  from all lookup default

Expected Behavior

Calico should respect routing over ens224

Current Behavior

Calico create following route :

blackhole 172.80.6.0/26 proto 80 172.80.6.1 dev calib670fd5cc65 scope link 172.80.6.2 dev caliba0d39000ce scope link 172.80.6.3 dev calibbe91b2e23c scope link 172.80.6.4 dev caliba9ebb5cc2a scope link

 ip route
default via 192.168.143.250 dev ens224 proto static metric 100
10.252.144.0/24 dev ens192 proto kernel scope link src 10.252.144.23 metric 101
blackhole 172.80.6.0/26 proto 80
172.80.6.1 dev calib670fd5cc65 scope link
172.80.6.2 dev caliba0d39000ce scope link
172.80.6.3 dev calibbe91b2e23c scope link
172.80.6.4 dev caliba9ebb5cc2a scope link
192.168.143.0/24 dev ens224 proto kernel scope link src 192.168.143.53 metric 100

Once calico route are created route looks like broken which make routing to ens192 (10.252.X):

ip route get 172.80.6.2
172.80.6.2 dev caliba0d39000ce src 10.252.144.23 uid 0
    cache

Calico are installed once the cluster is init following step :

  1. kubeadm init --apiserver-advertise-address=192.168.143.53 --pod-network-cidr=172.80.0.0/21 --control-plane-endpoint=192.168.143.53:6443

  2. kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml

  3. kubectl apply -f custom-resources.yaml

custom-resources.yaml contain :

# This section includes base Calico installation configuration.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 172.80.0.0/21
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

---

kind: Installation
apiVersion: operator.tigera.io/v1
metadata:
  name: default
spec:
  calicoNetwork:
    nodeAddressAutodetectionV4:
        interface: ens224

I was thinking about an issue of the IPv4 detection its why i've added the "interface: ens224" few minutes ago .. but nothing change.

All pods with ip 172.80.6.X are not in READY status :

NAMESPACE         NAME                                                  READY   STATUS             RESTARTS        AGE   IP               NODE                          NOMINATED NODE   READINESS GATES
calico-system     calico-kube-controllers-5455db6fb-b6pvp               0/1     CrashLoopBackOff   8 (4m20s ago)   29m   172.80.6.4       dev-test-frm-k8s-master01-v   <none>           <none>
calico-system     calico-node-skpvg                                     1/1     Running            0               29m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
calico-system     calico-typha-54d99cfb88-jccf9                         1/1     Running            0               29m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
calico-system     csi-node-driver-2th45                                 2/2     Running            0               29m   172.80.6.3       dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       coredns-7c65d6cfc9-khj8h                              0/1     CrashLoopBackOff   9 (2m53s ago)   42m   172.80.6.2       dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       coredns-7c65d6cfc9-vt8vb                              0/1     CrashLoopBackOff   9 (2m41s ago)   42m   172.80.6.1       dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       etcd-dev-test-frm-k8s-master01-v                      1/1     Running            1               42m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       kube-apiserver-dev-test-frm-k8s-master01-v            1/1     Running            0               42m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       kube-controller-manager-dev-test-frm-k8s-master01-v   1/1     Running            0               42m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       kube-proxy-g7dk7                                      1/1     Running            0               42m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
kube-system       kube-scheduler-dev-test-frm-k8s-master01-v            1/1     Running            0               42m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>
tigera-operator   tigera-operator-89c775547-rshk9                       1/1     Running            0               29m   192.168.143.53   dev-test-frm-k8s-master01-v   <none>           <none>

And I can see on kernel logs some martian warning when pods trying to communicate to master ip node :

Sep 30 11:45:01 DEV-TEST-FRM-K8S-MASTER01-V kernel: IPv4: martian source 192.168.143.53 from 172.80.6.1, on dev calib670fd5cc65

Possible Solution

I don't know

Your Environment

  • Calico version => 3.28.2
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes v1.31.0
  • Operating System and version: Rocky Linux 8.10 - 4.18.0-553.16.1.el8_10.x86_64 # 1 SMP Thu Aug 8 17:47:08 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Someone can help me to debug this behavior ? I'm staying at disposal for any questions or other

yakhatape avatar Sep 30 '24 09:09 yakhatape