calico icon indicating copy to clipboard operation
calico copied to clipboard

warning log constantly occurs in calico-node pod and kernel

Open moonek opened this issue 3 years ago • 9 comments

I've installed calico(vxlan mode) with the same manifest on dozens of kubernetes clusters, but this is the first time this has happened. All calico pods are running, and overlay network communication is working fine.

but, warning log constantly occurs in all calico-node pods

2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.251 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0xfb}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.22 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0x16}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 417: Trying to connect to netlink
2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 1118: Failed to access interface but it appears to be up error=netlink update operation failed ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 link=&netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:4, MTU:1450, TxQLen:0, Name:"vxlan.calico", HardwareAddr:net.HardwareAddr{0x66, 0x76, 0x38, 0x5d, 0x31, 0xcd}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0xc000d39680), Promisc:0, Xdp:(*netlink.LinkXdp)(nil), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0, NetNsID:0, NumTxQueues:1, NumRxQueues:1, GSOMaxSize:0x10000, GSOMaxSegs:0xffff, Vfs:[]netlink.VfInfo(nil), Group:0x0, Slave:netlink.LinkSlave(nil)}, VxlanId:4096, VtepDevIndex:2, SrcAddr:net.IP{0xb6, 0xc5, 0x38, 0x34}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:false, UDP6ZeroCSumTx:false, UDP6ZeroCSumRx:false, NoAge:false, GBP:false, FlowBased:false, Age:300, Limit:0, Port:4789, PortLow:0, PortHigh:0}
2022-01-18 09:57:23.012 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.251 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0xfb}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.012 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.22 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0x16}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 417: Trying to connect to netlink

Also, the below log occurs continuously in the kernel.

2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address

The peculiar thing I found is that the IP seen as multicast is FAILED.

worker01:/root#ip neighbor show
192.168.233.64 dev vxlan.calico lladdr 66:67:bd:3f:3c:e5 PERMANENT
10.10.10.18 dev ens192 lladdr 02:50:56:56:44:52 REACHABLE
10.10.10.51 dev ens192 lladdr 00:50:56:8b:07:76 STALE
192.168.232.1 dev caliec31aa2914e lladdr 62:20:35:fd:c1:b4 REACHABLE
224.0.0.22 dev vxlan.calico  FAILED
10.10.10.1 dev ens192 lladdr 02:50:56:56:44:52 REACHABLE
192.168.178.0 dev vxlan.calico lladdr 66:76:38:5d:31:cd PERMANENT
192.168.171.192 dev vxlan.calico lladdr 66:b6:99:56:ef:dc PERMANENT
10.10.10.52 dev ens192 lladdr 00:50:56:8b:9d:7b REACHABLE
10.10.10.60 dev ens192 lladdr 00:50:56:8b:a4:f1 REACHABLE
224.0.0.251 dev vxlan.calico  FAILED
10.10.10.53 dev ens192 lladdr 00:50:56:8b:10:23 STALE
192.168.139.192 dev vxlan.calico lladdr 66:62:8c:74:d3:02 PERMANENT
192.168.194.128 dev vxlan.calico lladdr 66:a6:2d:f1:7d:ca PERMANENT
192.168.128.128 dev vxlan.calico lladdr 66:b5:1e:0f:86:c3 PERMANENT
worker01:/root#arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.233.64            ether   66:67:bd:3f:3c:e5   CM                    vxlan.calico
10.10.10.18               ether   02:50:56:56:44:52   C                     ens192
10.10.10.51               ether   00:50:56:8b:07:76   C                     ens192
192.168.232.1             ether   62:20:35:fd:c1:b4   C                     caliec31aa2914e
224.0.0.22                       (incomplete)                              vxlan.calico
10.10.10.1                ether   02:50:56:56:44:52   C                     ens192
192.168.178.0             ether   66:76:38:5d:31:cd   CM                    vxlan.calico
192.168.171.192           ether   66:b6:99:56:ef:dc   CM                    vxlan.calico
10.10.10.52               ether   00:50:56:8b:9d:7b   C                     ens192
10.10.10.60               ether   00:50:56:8b:a4:f1   C                     ens192
224.0.0.251                      (incomplete)                              vxlan.calico
10.10.10.53               ether   00:50:56:8b:10:23   C                     ens192
192.168.139.192           ether   66:62:8c:74:d3:02   CM                    vxlan.calico
192.168.194.128           ether   66:a6:2d:f1:7d:ca   CM                    vxlan.calico
192.168.128.128           ether   66:b5:1e:0f:86:c3   CM                    vxlan.calico


calico config and daemonset.

---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the backend to use.
  calico_backend: "vxlan"

  # Configure the MTU to use for workload interfaces and tunnels.
  # By default, MTU is auto-detected, and explicitly setting this field should not be required.
  # You can override auto-detection by providing a non-zero value.
  veth_mtu: "0"

  # The CNI network configuration to install on each node. The special
  # values in this config will be automatically populated.
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "log_file_path": "/var/log/calico/cni/cni.log",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        },
        {
          "type": "bandwidth",
          "capabilities": {"bandwidth": true}
        }
      ]
    }
---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: calico-node
  namespace: kube-system
  labels:
    k8s-app: calico-node
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: calico-node
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      hostNetwork: true
      tolerations:
        # Make sure calico-node gets scheduled on all nodes.
        - effect: NoSchedule
          operator: Exists
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      serviceAccountName: calico-node
      # Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
      # deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
      terminationGracePeriodSeconds: 0
      priorityClassName: system-node-critical
      initContainers:
        # This container performs upgrade from host-local IPAM to calico-ipam.
        # It can be deleted if this is a fresh installation, or if you have already
        # upgraded to use calico-ipam.
        - name: upgrade-ipam
          image: docker.io/calico/cni:v3.19.3
          command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
          envFrom:
          - configMapRef:
              # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
              name: kubernetes-services-endpoint
              optional: true
          env:
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
          volumeMounts:
            - mountPath: /var/lib/cni/networks
              name: host-local-net-dir
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
          securityContext:
            privileged: true
        # This container installs the CNI binaries
        # and CNI network config file on each node.
        - name: install-cni
          image: docker.io/calico/cni:v3.19.3
          command: ["/opt/cni/bin/install"]
          envFrom:
          - configMapRef:
              # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
              name: kubernetes-services-endpoint
              optional: true
          env:
            # Name of the CNI config file to create.
            - name: CNI_CONF_NAME
              value: "10-calico.conflist"
            # The CNI network config to install on each node.
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: cni_network_config
            # Set the hostname based on the k8s node name.
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # CNI MTU Config variable
            - name: CNI_MTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # Prevents the container from sleeping forever.
            - name: SLEEP
              value: "false"
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
          securityContext:
            privileged: true
        # Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
        # to communicate with Felix over the Policy Sync API.
        - name: flexvol-driver
          image: docker.io/calico/pod2daemon-flexvol:v3.19.3
          volumeMounts:
          - name: flexvol-driver-host
            mountPath: /host/driver
          securityContext:
            privileged: true
      containers:
        # Runs calico-node container on each Kubernetes node. This
        # container programs network policy and routes on each
        # host.
        - name: calico-node
          image: docker.io/calico/node:v3.19.3
          envFrom:
          - configMapRef:
              # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
              name: kubernetes-services-endpoint
              optional: true
          env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"
            # Wait for the datastore.
            - name: WAIT_FOR_DATASTORE
              value: "true"
            # Set based on the k8s node name.
            - name: NODENAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # Choose the backend to use.
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
            # Cluster type to identify the deployment type
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            - name: IP_AUTODETECTION_METHOD
              value: "cidr=10.10.10.0/24"
            # Enable IPIP
            #- name: CALICO_IPV4POOL_IPIP
            #  value: "Never"
            # Enable or Disable VXLAN on the default IP pool.
            - name: CALICO_IPV4POOL_VXLAN
              value: "Always"
            # Set MTU for tunnel device used if ipip is enabled
            - name: FELIX_IPINIPMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # Set MTU for the VXLAN tunnel device.
            - name: FELIX_VXLANMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # Set MTU for the Wireguard tunnel device.
            - name: FELIX_WIREGUARDMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # The default IPv4 pool to create on startup if none exists. Pod IPs will be
            # chosen from this range. Changing this value after installation will have
            # no effect. This should fall within `--cluster-cidr`.
            - name: CALICO_IPV4POOL_CIDR
              value: "192.168.0.0/16"
            # Disable file logging so `kubectl logs` works.
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            # Set Felix endpoint to host default action to ACCEPT.
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            # Disable IPv6 on Kubernetes.
            - name: FELIX_IPV6SUPPORT
              value: "false"
            # Set Felix logging to "info"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "info"
            - name: FELIX_HEALTHENABLED
              value: "true"
            - name: FELIX_FEATUREDETECTOVERRIDE
              value: "ChecksumOffloadBroken=true"
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: 250m
          livenessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-live
              #- -bird-live
            periodSeconds: 10
            initialDelaySeconds: 10
            failureThreshold: 6
          readinessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-ready
              #- -bird-ready
            periodSeconds: 10
          volumeMounts:
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /run/xtables.lock
              name: xtables-lock
              readOnly: false
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /var/lib/calico
              name: var-lib-calico
              readOnly: false
            - name: policysync
              mountPath: /var/run/nodeagent
            # For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the
            # parent directory.
            - name: sysfs
              mountPath: /sys/fs/
              # Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
              # If the host is known to mount that filesystem already then Bidirectional can be omitted.
              mountPropagation: Bidirectional
            - name: cni-log-dir
              mountPath: /var/log/calico/cni
              readOnly: true
      volumes:
        # Used by calico-node.
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: var-run-calico
          hostPath:
            path: /var/run/calico
        - name: var-lib-calico
          hostPath:
            path: /var/lib/calico
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
        - name: sysfs
          hostPath:
            path: /sys/fs/
            type: DirectoryOrCreate
        # Used to install CNI.
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-net-dir
          hostPath:
            path: /etc/cni/net.d
        # Used to access CNI logs.
        - name: cni-log-dir
          hostPath:
            path: /var/log/calico/cni
        # Mount in the directory for host-local IPAM allocations. This is
        # used when upgrading from host-local to calico-ipam, and can be removed
        # if not using the upgrade-ipam init container.
        - name: host-local-net-dir
          hostPath:
            path: /var/lib/cni/networks
        # Used to create per-pod Unix Domain Sockets
        - name: policysync
          hostPath:
            type: DirectoryOrCreate
            path: /var/run/nodeagent
        # Used to install Flex Volume Driver
        - name: flexvol-driver-host
          hostPath:
            type: DirectoryOrCreate
            path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---

Your Environment

  • Calico version: v3.19.3
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes v1.21.2
  • Operating System and version: RHEL 7.4, kernel: 3.10.0-693.el7 (Although lower, other clusters are also using the same version)
  • Link to your project (optional):

moonek avatar Jan 18 '22 13:01 moonek

Do you know where that address is coming from? (224.0.0.251)?

Do you have pods within the multicast IP range? Normally we'd only program routes to the VXLAN device which are for pod subnets, so surprised to see that there at all.

caseydavenport avatar Jan 25 '22 17:01 caseydavenport

@moonek any update on this one?

caseydavenport avatar Feb 08 '22 17:02 caseydavenport

Couldn't solve it. Still 30-40 lines of logs are being generated in the kernel per second. Nowhere is IP in the 224 band being used.

$ kubectl get po -A -owide | grep 224
$

moonek avatar Feb 09 '22 00:02 moonek

Nowhere is IP in the 224 band being used.

Something must be programming that entry targeting the vxlan.calico interface, and Calico is correctly spotting that it doesn't belong there (because there are no pods with that IP on the network).

Calico only programs static entries with known hwaddr. I'm guessing this entry is being put there due to something in your cluster trying to talk multicast over the vxlan.calico interface.

Based on a quick google, I see those multicast IPs are registered: https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml#multicast-addresses-12

224.0.0.251 | mDNS
224.0.0.22 | IGMP

Although anything could be trying to access them. Are you running anything like that ^ in your cluster?

caseydavenport avatar Feb 11 '22 17:02 caseydavenport

Might be worth a TCP dump to see what's sending traffic to those addresses

caseydavenport avatar Feb 11 '22 17:02 caseydavenport

In parallel, Calico should be able to remove those addresses. Might need to fine-tune the way we craft our deletion request so it's valid, or a way to skip entries which we know aren't programmed by us (currently we use the vxlan.calico interface to decide that, but if someone else is programming entries for that interface we can't tell the difference)

caseydavenport avatar Feb 11 '22 17:02 caseydavenport

I had the same issue too.

0123hoang avatar Aug 12 '22 04:08 0123hoang

@caseydavenport must find the source multicast IP from ? we can solve it. I try to delete the neigh 224, but failed.

JornShen avatar Sep 01 '22 03:09 JornShen

For some more information, we are also seeing this issue on our 16.04 desktop nodes. These nodes are from a legacy deployment and there host os and kernel cannot be upgraded unfortunately.

After digging into those addresses, it appears that the avahi daemon is programming the mDNS (224.0.0.251) address, and systemd-resolve is programming the LLMNR (224.0.0.252). I have added a whitelist to the primary network interface for avahi and that address no longer appears in the neighor list, but I'm still working on figuring out how to disable the other two.

This post was enlightening and described the addresses and their likely sources: https://superuser.com/questions/1063676/is-my-arp-cache-poisoned

This issue helped me whitelist the primary network interface for the avahi daemon without disabling it entirely: https://github.com/freepn/fpnd/issues/67

I don't believe any of these workarounds to be solutions

MiddleMan5 avatar Oct 12 '22 17:10 MiddleMan5