calico
calico copied to clipboard
warning log constantly occurs in calico-node pod and kernel
I've installed calico(vxlan mode
) with the same manifest on dozens of kubernetes clusters, but this is the first time this has happened.
All calico pods are running, and overlay network communication is working fine.
but, warning log constantly occurs in all calico-node pods
2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.251 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0xfb}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.22 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0x16}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.011 [INFO][51] felix/route_table.go 417: Trying to connect to netlink
2022-01-18 09:57:23.011 [WARNING][51] felix/route_table.go 1118: Failed to access interface but it appears to be up error=netlink update operation failed ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 link=&netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:4, MTU:1450, TxQLen:0, Name:"vxlan.calico", HardwareAddr:net.HardwareAddr{0x66, 0x76, 0x38, 0x5d, 0x31, 0xcd}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0xc000d39680), Promisc:0, Xdp:(*netlink.LinkXdp)(nil), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0, NetNsID:0, NumTxQueues:1, NumRxQueues:1, GSOMaxSize:0x10000, GSOMaxSegs:0xffff, Vfs:[]netlink.VfInfo(nil), Group:0x0, Slave:netlink.LinkSlave(nil)}, VxlanId:4096, VtepDevIndex:2, SrcAddr:net.IP{0xb6, 0xc5, 0x38, 0x34}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:false, UDP6ZeroCSumTx:false, UDP6ZeroCSumRx:false, NoAge:false, GBP:false, FlowBased:false, Age:300, Limit:0, Port:4789, PortLow:0, PortHigh:0}
2022-01-18 09:57:23.012 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.251 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0xfb}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.012 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.22 HardwareAddr: LLIPAddr:<nil> Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 982: Removed old neighbor ARP entry ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 neighbor=netlink.Neigh{LinkIndex:4, Family:2, State:32, Type:5, Flags:0, IP:net.IP{0xe0, 0x0, 0x0, 0x16}, HardwareAddr:net.HardwareAddr(nil), LLIPAddr:net.IP(nil), Vlan:0, VNI:0, MasterIndex:0}
2022-01-18 09:57:23.012 [INFO][51] felix/route_table.go 417: Trying to connect to netlink
Also, the below log occurs continuously in the kernel.
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
2022-01-18 09:57:23 worker01 kernel: PF_BRIDGE: RTM_DELNEIGH with invalid address
The peculiar thing I found is that the IP seen as multicast is FAILED
.
worker01:/root#ip neighbor show
192.168.233.64 dev vxlan.calico lladdr 66:67:bd:3f:3c:e5 PERMANENT
10.10.10.18 dev ens192 lladdr 02:50:56:56:44:52 REACHABLE
10.10.10.51 dev ens192 lladdr 00:50:56:8b:07:76 STALE
192.168.232.1 dev caliec31aa2914e lladdr 62:20:35:fd:c1:b4 REACHABLE
224.0.0.22 dev vxlan.calico FAILED
10.10.10.1 dev ens192 lladdr 02:50:56:56:44:52 REACHABLE
192.168.178.0 dev vxlan.calico lladdr 66:76:38:5d:31:cd PERMANENT
192.168.171.192 dev vxlan.calico lladdr 66:b6:99:56:ef:dc PERMANENT
10.10.10.52 dev ens192 lladdr 00:50:56:8b:9d:7b REACHABLE
10.10.10.60 dev ens192 lladdr 00:50:56:8b:a4:f1 REACHABLE
224.0.0.251 dev vxlan.calico FAILED
10.10.10.53 dev ens192 lladdr 00:50:56:8b:10:23 STALE
192.168.139.192 dev vxlan.calico lladdr 66:62:8c:74:d3:02 PERMANENT
192.168.194.128 dev vxlan.calico lladdr 66:a6:2d:f1:7d:ca PERMANENT
192.168.128.128 dev vxlan.calico lladdr 66:b5:1e:0f:86:c3 PERMANENT
worker01:/root#arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.233.64 ether 66:67:bd:3f:3c:e5 CM vxlan.calico
10.10.10.18 ether 02:50:56:56:44:52 C ens192
10.10.10.51 ether 00:50:56:8b:07:76 C ens192
192.168.232.1 ether 62:20:35:fd:c1:b4 C caliec31aa2914e
224.0.0.22 (incomplete) vxlan.calico
10.10.10.1 ether 02:50:56:56:44:52 C ens192
192.168.178.0 ether 66:76:38:5d:31:cd CM vxlan.calico
192.168.171.192 ether 66:b6:99:56:ef:dc CM vxlan.calico
10.10.10.52 ether 00:50:56:8b:9d:7b C ens192
10.10.10.60 ether 00:50:56:8b:a4:f1 C ens192
224.0.0.251 (incomplete) vxlan.calico
10.10.10.53 ether 00:50:56:8b:10:23 C ens192
192.168.139.192 ether 66:62:8c:74:d3:02 CM vxlan.calico
192.168.194.128 ether 66:a6:2d:f1:7d:ca CM vxlan.calico
192.168.128.128 ether 66:b5:1e:0f:86:c3 CM vxlan.calico
calico config and daemonset.
---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Typha is disabled.
typha_service_name: "none"
# Configure the backend to use.
calico_backend: "vxlan"
# Configure the MTU to use for workload interfaces and tunnels.
# By default, MTU is auto-detected, and explicitly setting this field should not be required.
# You can override auto-detection by providing a non-zero value.
veth_mtu: "0"
# The CNI network configuration to install on each node. The special
# values in this config will be automatically populated.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"mtu": __CNI_MTU__,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: calico-node
spec:
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
tolerations:
# Make sure calico-node gets scheduled on all nodes.
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
serviceAccountName: calico-node
# Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
# deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
terminationGracePeriodSeconds: 0
priorityClassName: system-node-critical
initContainers:
# This container performs upgrade from host-local IPAM to calico-ipam.
# It can be deleted if this is a fresh installation, or if you have already
# upgraded to use calico-ipam.
- name: upgrade-ipam
image: docker.io/calico/cni:v3.19.3
command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
envFrom:
- configMapRef:
# Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
name: kubernetes-services-endpoint
optional: true
env:
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
volumeMounts:
- mountPath: /var/lib/cni/networks
name: host-local-net-dir
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
securityContext:
privileged: true
# This container installs the CNI binaries
# and CNI network config file on each node.
- name: install-cni
image: docker.io/calico/cni:v3.19.3
command: ["/opt/cni/bin/install"]
envFrom:
- configMapRef:
# Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
name: kubernetes-services-endpoint
optional: true
env:
# Name of the CNI config file to create.
- name: CNI_CONF_NAME
value: "10-calico.conflist"
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
# Set the hostname based on the k8s node name.
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# CNI MTU Config variable
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# Prevents the container from sleeping forever.
- name: SLEEP
value: "false"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
securityContext:
privileged: true
# Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
# to communicate with Felix over the Policy Sync API.
- name: flexvol-driver
image: docker.io/calico/pod2daemon-flexvol:v3.19.3
volumeMounts:
- name: flexvol-driver-host
mountPath: /host/driver
securityContext:
privileged: true
containers:
# Runs calico-node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
- name: calico-node
image: docker.io/calico/node:v3.19.3
envFrom:
- configMapRef:
# Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
name: kubernetes-services-endpoint
optional: true
env:
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
# Wait for the datastore.
- name: WAIT_FOR_DATASTORE
value: "true"
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Choose the backend to use.
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "cidr=10.10.10.0/24"
# Enable IPIP
#- name: CALICO_IPV4POOL_IPIP
# value: "Never"
# Enable or Disable VXLAN on the default IP pool.
- name: CALICO_IPV4POOL_VXLAN
value: "Always"
# Set MTU for tunnel device used if ipip is enabled
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# Set MTU for the VXLAN tunnel device.
- name: FELIX_VXLANMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# Set MTU for the Wireguard tunnel device.
- name: FELIX_WIREGUARDMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPv6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
# Set Felix logging to "info"
- name: FELIX_LOGSEVERITYSCREEN
value: "info"
- name: FELIX_HEALTHENABLED
value: "true"
- name: FELIX_FEATUREDETECTOVERRIDE
value: "ChecksumOffloadBroken=true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
exec:
command:
- /bin/calico-node
- -felix-live
#- -bird-live
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
exec:
command:
- /bin/calico-node
- -felix-ready
#- -bird-ready
periodSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: false
- name: policysync
mountPath: /var/run/nodeagent
# For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the
# parent directory.
- name: sysfs
mountPath: /sys/fs/
# Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
# If the host is known to mount that filesystem already then Bidirectional can be omitted.
mountPropagation: Bidirectional
- name: cni-log-dir
mountPath: /var/log/calico/cni
readOnly: true
volumes:
# Used by calico-node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: var-lib-calico
hostPath:
path: /var/lib/calico
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: sysfs
hostPath:
path: /sys/fs/
type: DirectoryOrCreate
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
# Used to access CNI logs.
- name: cni-log-dir
hostPath:
path: /var/log/calico/cni
# Mount in the directory for host-local IPAM allocations. This is
# used when upgrading from host-local to calico-ipam, and can be removed
# if not using the upgrade-ipam init container.
- name: host-local-net-dir
hostPath:
path: /var/lib/cni/networks
# Used to create per-pod Unix Domain Sockets
- name: policysync
hostPath:
type: DirectoryOrCreate
path: /var/run/nodeagent
# Used to install Flex Volume Driver
- name: flexvol-driver-host
hostPath:
type: DirectoryOrCreate
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---
Your Environment
- Calico version: v3.19.3
- Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes v1.21.2
- Operating System and version: RHEL 7.4, kernel: 3.10.0-693.el7 (Although lower, other clusters are also using the same version)
- Link to your project (optional):
Do you know where that address is coming from? (224.0.0.251)?
Do you have pods within the multicast IP range? Normally we'd only program routes to the VXLAN device which are for pod subnets, so surprised to see that there at all.
@moonek any update on this one?
Couldn't solve it. Still 30-40 lines of logs are being generated in the kernel per second. Nowhere is IP in the 224 band being used.
$ kubectl get po -A -owide | grep 224
$
Nowhere is IP in the 224 band being used.
Something must be programming that entry targeting the vxlan.calico interface, and Calico is correctly spotting that it doesn't belong there (because there are no pods with that IP on the network).
Calico only programs static entries with known hwaddr. I'm guessing this entry is being put there due to something in your cluster trying to talk multicast over the vxlan.calico interface.
Based on a quick google, I see those multicast IPs are registered: https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml#multicast-addresses-12
224.0.0.251 | mDNS
224.0.0.22 | IGMP
Although anything could be trying to access them. Are you running anything like that ^ in your cluster?
Might be worth a TCP dump to see what's sending traffic to those addresses
In parallel, Calico should be able to remove those addresses. Might need to fine-tune the way we craft our deletion request so it's valid, or a way to skip entries which we know aren't programmed by us (currently we use the vxlan.calico interface to decide that, but if someone else is programming entries for that interface we can't tell the difference)
I had the same issue too.
@caseydavenport must find the source multicast IP from ? we can solve it. I try to delete the neigh 224, but failed.
For some more information, we are also seeing this issue on our 16.04 desktop nodes. These nodes are from a legacy deployment and there host os and kernel cannot be upgraded unfortunately.
After digging into those addresses, it appears that the avahi daemon is programming the mDNS (224.0.0.251) address, and systemd-resolve is programming the LLMNR (224.0.0.252). I have added a whitelist to the primary network interface for avahi and that address no longer appears in the neighor list, but I'm still working on figuring out how to disable the other two.
This post was enlightening and described the addresses and their likely sources: https://superuser.com/questions/1063676/is-my-arp-cache-poisoned
This issue helped me whitelist the primary network interface for the avahi daemon without disabling it entirely: https://github.com/freepn/fpnd/issues/67
I don't believe any of these workarounds to be solutions