kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

VXLAN: bad UDP checksums

Open maxpain opened this issue 3 years ago • 6 comments
trafficstars

Environment:

  • Cloud provider or hardware configuration: Virtual machine

  • OS:

Linux 5.15.0-37-generic x86_64
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Kubespray version: v2.19.0

Network plugin used: calico with vxlan

The problem: When using calico with vxlan as a tunnel there is no connectivity between containers on different nodes:

image

Workaround:

sudo ethtool -K vxlan.calico tx-checksum-ip-generic off

or

featureDetectOverride: "ChecksumOffloadBroken=true"

But this has a performance impact

Related issues: https://github.com/projectcalico/calico/issues/3145 https://github.com/projectcalico/calico/issues/4865 https://github.com/flannel-io/flannel/issues/1279 https://github.com/rancher/rke2/issues/1541

maxpain avatar Jun 16 '22 23:06 maxpain

This is usually a combination kernel version + driver + firmware version, could you give us ethtool -i <eth0> (name might not be eth0) and more details on the hypervisor ? Side note, with hardware offload it's normal to have bad checksum on outgoing packets in tcpdump, so when investigating only look at incoming packets (tcpdump -Q in ...)

champtar avatar Jun 17 '22 01:06 champtar

@champtar

driver: vmxnet3
version: 1.6.0.0-k-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

maxpain avatar Jun 17 '22 02:06 maxpain

Side note, with hardware offload it's normal to have bad checksum on outgoing packets

Yes, but these packets don't send.

maxpain avatar Jun 17 '22 02:06 maxpain

Would be worth a shot to open a vmware ticket

champtar avatar Jun 17 '22 02:06 champtar

Agreed with @champtar this looks more like a kernel/hypervisor issue, you can also reach out the the project calico folks with the details above since kubespray only deploys calico not modifies it in any way.

cristicalin avatar Jun 17 '22 17:06 cristicalin

~~This issue suggests its a recent kernel driver update that's causing the issue:~~ I didn't spot champtar in this thread, and I didn't properly look at the originally linked issues. Leaving this here as another reference. https://github.com/projectcalico/calico/issues/4727

I haven't confirmed via tcpdump but I have been struggling with calico vxlan on Ubuntu 20.04 with HWE 5.13 kernel. vxlan works fine on standard 5.4 kernel. Redeploying with IP in IP mode allows things to work for 5.13 kernel.

DomHoney avatar Jul 11 '22 16:07 DomHoney

I have the same issue at the RockyLinux 9. kernel version: 5.14.0-70.13.1.el9_0.x86_64

yankay avatar Sep 02 '22 08:09 yankay

I finally switched from Kubespray to Talos OS, and I no more have any configuration problems with OS.

maxpain avatar Sep 02 '22 08:09 maxpain