kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

Cilium doesn't work on Ubuntu 22.04

Open maxpain opened this issue 2 years ago • 7 comments

Environment:

  • Cloud provider or hardware configuration: Bare metal

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 5.15.0-40-generic x86_64
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
  • Version of Ansible (ansible --version): 2.10.15

  • Version of Python (python --version): 3.9.13

Kubespray version (commit) (git rev-parse --short HEAD): 58e9324

Network plugin used: Cilium

In Ubuntu 22.04 we should disable rp_filter to make Cilium work.

https://github.com/cilium/cilium/issues/18131#issuecomment-988160016

maxpain avatar Jun 28 '22 08:06 maxpain

@maxpain Thanks for submitting this issue. Could you provide actual error message of Kubespray to know which task was failed on Ubuntu 22.04?

oomichi avatar Jun 28 '22 21:06 oomichi

Probably related to how Ubuntu has the setting in two different places: https://github.com/cilium/cilium/issues/20125#issuecomment-1185176384

Cilium just will not work with default Ubuntu settings (and one issue mentioned RHEL 9). If you just dump a config in the usual place to override the values, it's not enough.

protosam avatar Jul 15 '22 05:07 protosam

This is actually fixed in the latest Cilium pre-release v1.12.0-rc3

maxpain avatar Jul 15 '22 05:07 maxpain

This doesn't affect me and I don't use Kubespray at all, just aiming to be a friendly neighborhood nerd here.

This is actually https://github.com/cilium/cilium/pull/20072 in the latest Cilium pre-release v1.12.0-rc3

I see nothing in the cilium repo that patches /usr/lib/sysctl.d, which is necessary to "fix" this on Ubuntu 22.04 or newer if the desired state is to override the defaults.

protosam avatar Jul 18 '22 22:07 protosam

@protosam see https://github.com/cilium/cilium/pull/20072/files#diff-1cadee1ea10bb25d793baf555b85040a00ff0bc7f049a2542f0c2590ab4e7f0fR39-R45

maxpain avatar Jul 18 '22 23:07 maxpain

Not sure if this is enough to "fix" the problem. What I see is as follows:

  • sysctlConfig is a variable containing what will become file contents.
  • Namely, whatever the value of overwritesPath is, that file will have the contents of sysctlConfig.
  • By default it is a single file that will resolve to become /etc/sysctl.d/99-zzz-override_cilium.conf, because path.Join(*sysctlD, *ciliumOverwrites).

So /etc/sysctl.d/99-zzz-override_cilium.conf ends up with the following contents:

# Disable rp_filter on Cilium interfaces since it may cause mangled packets to be dropped
net.ipv4.conf.lxc*.rp_filter = 0
net.ipv4.conf.cilium_*.rp_filter = 0
# The kernel uses max(conf.all, conf.{dev}) as its value, so we need to set .all. to 0 as well.
# Otherwise it will overrule the device specific settings.
net.ipv4.conf.all.rp_filter = 0

In my testing, the contents of /usr/lib/sysctl.d/50-default.conf seem to win out over any modifications made in /etc/sysctl.d/*.conf.

Thanks canonical:

root@localhost:~# grep rp_fi /usr/lib/sysctl.d/50-default.conf 
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.*.rp_filter = 2
#-net.ipv4.conf.all.rp_filter # Ubuntu uses /etc/sysctl.d/10-network-security.conf

In my own solutions, I go so far as to purge 50-default.conf of rp_filter entries.

protosam avatar Jul 19 '22 01:07 protosam

This is fixed since Cilium >= 1.9.18, >= 1.10.13 and >= 1.11.7. Thank you

aanm avatar Jul 29 '22 20:07 aanm

It looks like /roles/network_plugin/cilium/templates/cilium/ds.yml.j2 is out of sync with https://github.com/cilium/cilium/blob/v1.11.7/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml. As a result apply-sysctl-overwrites init container is missing and sysctl fix which is part of Cilium v1.11.7 is not getting applied at all.

There are other discrepancies between 2 files not relevant to this issue.

sutinski avatar Aug 16 '22 19:08 sutinski

It looks like /roles/network_plugin/cilium/templates/cilium/ds.yml.j2 is out of sync with https://github.com/cilium/cilium/blob/v1.11.7/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml. As a result apply-sysctl-overwrites init container is missing and sysctl fix which is part of Cilium v1.11.7 is not getting applied at all.

There are other discrepancies between 2 files not relevant to this issue.

You're correct. Cilium manifests change rapidly over a short period. So I'll update Cilium to v1.12 with #9187 (hopefully without breaking backward compatibility) and will also try to look into these.

necatican avatar Aug 17 '22 08:08 necatican