kubespray
kubespray copied to clipboard
VXLAN: bad UDP checksums
Environment:
-
Cloud provider or hardware configuration: Virtual machine
-
OS:
Linux 5.15.0-37-generic x86_64
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Kubespray version: v2.19.0
Network plugin used: calico with vxlan
The problem: When using calico with vxlan as a tunnel there is no connectivity between containers on different nodes:
Workaround:
sudo ethtool -K vxlan.calico tx-checksum-ip-generic off
or
featureDetectOverride: "ChecksumOffloadBroken=true"
But this has a performance impact
Related issues: https://github.com/projectcalico/calico/issues/3145 https://github.com/projectcalico/calico/issues/4865 https://github.com/flannel-io/flannel/issues/1279 https://github.com/rancher/rke2/issues/1541
This is usually a combination kernel version + driver + firmware version, could you give us ethtool -i <eth0> (name might not be eth0) and more details on the hypervisor ?
Side note, with hardware offload it's normal to have bad checksum on outgoing packets in tcpdump, so when investigating only look at incoming packets (tcpdump -Q in ...)
@champtar
driver: vmxnet3
version: 1.6.0.0-k-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
Side note, with hardware offload it's normal to have bad checksum on outgoing packets
Yes, but these packets don't send.
Would be worth a shot to open a vmware ticket
Agreed with @champtar this looks more like a kernel/hypervisor issue, you can also reach out the the project calico folks with the details above since kubespray only deploys calico not modifies it in any way.
~~This issue suggests its a recent kernel driver update that's causing the issue:~~ I didn't spot champtar in this thread, and I didn't properly look at the originally linked issues. Leaving this here as another reference. https://github.com/projectcalico/calico/issues/4727
I haven't confirmed via tcpdump but I have been struggling with calico vxlan on Ubuntu 20.04 with HWE 5.13 kernel. vxlan works fine on standard 5.4 kernel. Redeploying with IP in IP mode allows things to work for 5.13 kernel.
I have the same issue at the RockyLinux 9. kernel version: 5.14.0-70.13.1.el9_0.x86_64
I finally switched from Kubespray to Talos OS, and I no more have any configuration problems with OS.