k3s
k3s copied to clipboard
missing 'flanneld masq' rules
Environmental Info:
k3s version v1.22.13+k3s1 (3daf4ed4)
go version go1.16.10
Node(s) CPU architecture, OS, and Version:
Linux ip-172-31-15-30 5.4.0-1078-aws #84~18.04.1-Ubuntu SMP Fri Jun 3 12:59:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 server
Describe the bug: After upgrading the k3s version from 1.22.12 to 1.22.13 on my ubuntu 18.04 server, I noticed that my pods couldn't reach the internet, after debugging it turned out that there aren't default masquerade rules on 1.22.13. On ubuntu 20.04 and 22.04, rules are there, and everything works as usual.
Steps To Reproduce:
- On freshly installed ubuntu 18.04, install the latest k3s from 1.22 channel
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -s - --write-kubeconfig-mode 644
- Wait for all pods to become ready (~30 sec) and check the iptables default masquerade rules
iptables -vnL -t nat |grep 'MASQUERADE all'
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x2000/0x2000
5 220 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */
- Then downgrade the k3s version to 1.22.12 and recheck the masquerade rules
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.22.12+k3s1 sh -s - --write-kubeconfig-mode 644
iptables -vnL -t nat |grep 'MASQUERADE all'
2 141 MASQUERADE all -- * * 10.42.0.0/16 !224.0.0.0/4 /* flanneld masq */
0 0 MASQUERADE all -- * * !10.42.0.0/16 10.42.0.0/16 /* flanneld masq */
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x2000/0x2000
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */
@manuelbuil any ideas? I suspect that perhaps some of the rules have changed between versions?
@manuelbuil any ideas? I suspect that perhaps some of the rules have changed between versions?
It seems so but I don't think we have changed anything in that area. Need to investigate
Are you using a specific config on /etc/rancher/k3s/config.yaml
?
Are you using a specific config on
/etc/rancher/k3s/config.yaml
?
No, I'm doing exactly what I've written in 'Steps To Reproduce': on freshly installed ubuntu 18.04 executing this command
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -
Can you give me the K3s logs?
Sure thing!
Something scary here:
Aug 30 13:48:11 ip-172-31-1-153 k3s[2242]: E0830 13:48:11.660256 2242 iptables.go:192] Failed to setup IPTables. iptables-restore binary was not found: no iptables-restore version found in string:
What do you get from running:
-
k3s check-config
-
iptables-restore --version
iptables-restore on ubuntu 18.04 doesn't have the --version
argument
root@ip-172-31-7-31:~# k3s check-config
Verifying binaries in /var/lib/rancher/k3s/data/8b526847b7c1f339a0878b18029f581027f82097c5128778df15dd816a058bda/bin:
- sha256sum: good
- links: good
System:
- /sbin iptables v1.6.1: older than v1.8
- swap: disabled
- routes: ok
Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000
modprobe: FATAL: Module configs not found in directory /lib/modules/5.4.0-1078-aws
info: reading kernel config from /boot/config-5.4.0-1078-aws ...
Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- /sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled
Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
- "overlay":
- CONFIG_VXLAN: enabled (as module)
Optional (for encrypted networks):
- CONFIG_CRYPTO: enabled
- CONFIG_CRYPTO_AEAD: enabled
- CONFIG_CRYPTO_GCM: enabled
- CONFIG_CRYPTO_SEQIV: enabled
- CONFIG_CRYPTO_GHASH: enabled
- CONFIG_XFRM: enabled
- CONFIG_XFRM_USER: enabled (as module)
- CONFIG_XFRM_ALGO: enabled (as module)
- CONFIG_INET_ESP: enabled (as module)
- CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
- "overlay":
- CONFIG_OVERLAY_FS: enabled (as module)
STATUS: pass
root@ip-172-31-7-31:~# iptables-restore --version
iptables-restore: unrecognized option '--version'
^C
root@ip-172-31-7-31:~# iptables-restore -V
iptables-restore: invalid option -- 'V'
^C
root@ip-172-31-7-31:~# man iptables-restore | tail -1
iptables 1.6.1 IPTABLES-RESTORE(8)
root@ip-172-31-7-31:~#
root@ip-172-31-7-31:~# iptables-restore --version
iptables-restore: unrecognized option '--version'
That is broken in a way that is incompatible with Kubernetes, since it runs iptables-restore --version
to check the installed version. Reference: https://github.com/kubernetes/kubernetes/blob/release-1.22/pkg/util/iptables/iptables.go#L708
Where is this broken version of iptables-restore from?
The most exciting thing is why it works on 1.22.12 but not working on 1.22.13
root@ip-172-31-1-244:~# apt show iptables
Package: iptables
Version: 1.6.1-2ubuntu2
Priority: standard
Section: net
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Debian Netfilter Packaging Team <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1635 kB
Depends: libip4tc0 (= 1.6.1-2ubuntu2), libip6tc0 (= 1.6.1-2ubuntu2), libiptc0 (= 1.6.1-2ubuntu2), libxtables12 (= 1.6.1-2ubuntu2), libc6 (>= 2.14), libnetfilter-conntrack3, libnfnetlink0
Suggests: kmod
Homepage: http://www.netfilter.org/
Task: standard, ubuntu-core
Supported: 5y
Download-Size: 269 kB
APT-Manual-Installed: no
APT-Sources: http://us-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
Description: administration tools for packet filtering and NAT
iptables is the userspace command line program used to configure
the Linux packet filtering ruleset. It is targeted towards system
administrators. Since Network Address Translation is also configured
from the packet filter ruleset, iptables is used for this, too. The
iptables package also includes ip6tables. ip6tables is used for
configuring the IPv6 packet filter
On the latest version on Flannel that is on the latest K3s we check the iptables-restore
version.
On the previous version the check wasn't done on Flannel that's why it worked for you.
I suggest you to check the installed iptables version and probably update it because 1.8 is required as minimum version, I don't know when they introduced the --version
flag.
For me, staying on 1.22.12 on my old ubuntu 18.04 servers is okay. I've opened the issue to understand what's happening because I knew that k3s should work on any Linux w/o issue. Also, I suppose it may be necessary to add a Preparation step somewhere here for ubuntu 18.04 if you're not going to fix it.
Ok I checked the Flannel code.
The issue was introduced on the check of the version of iptables-restore
to verify if it supports the --wait
flag.
The --wait
flag was introduced on 1.6.2 version with the --version
flag so it's useless the check as it is.
I'll fix it on Flannel.
Created a cluster using k3s v1.23.10+k3s1 on Ubuntu 18.04 and found pods are in CrashLoop state
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-7-228 Ready control-plane,master 51m v1.23.10+k3s1
ip-172-31-0-57 Ready <none> 42m v1.23.10+k3s1
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-d76bd69b-tr9q6 0/1 Running 0 45m
kube-system helm-install-traefik-8qbnv 1/1 Running 8 (8m10s ago) 45m
kube-system local-path-provisioner-6c79684f77-vqnt7 0/1 CrashLoopBackOff 12 (106s ago) 45m
kube-system metrics-server-7cd5fcb6b7-ksbt4 0/1 CrashLoopBackOff 12 (80s ago) 45m
kube-system helm-install-traefik-crd-5zb6g 0/1 CrashLoopBackOff 8 (13s ago) 45m
Logs from coredns and traefik
$ kubectl logs -n kube-system coredns-d76bd69b-tr9q6
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
$ kubectl logs -n kube-system helm-install-traefik-crd-5zb6g
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
echo "KUBERNETES_SERVICE_HOST is using IPv6"
CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi
set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /home/klipper-helm/.helm
Creating /home/klipper-helm/.helm/repository
Creating /home/klipper-helm/.helm/repository/cache
Creating /home/klipper-helm/.helm/repository/local
Creating /home/klipper-helm/.helm/plugins
Creating /home/klipper-helm/.helm/starters
Creating /home/klipper-helm/.helm/cache/archive
Creating /home/klipper-helm/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable/
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^traefik-crd$' --output json
[main] 2022/09/01 22:05:53 Starting Tiller v2.17.0 (tls=false)
[main] 2022/09/01 22:05:53 GRPC listening on 127.0.0.1:44134
[main] 2022/09/01 22:05:53 Probes listening on :44135
[main] 2022/09/01 22:05:53 Storage driver is Secret
[main] 2022/09/01 22:05:53 Max history per release is 0
[storage] 2022/09/01 22:05:54 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik-crd.tgz.base64
+ CHART_PATH=/tmp/traefik-crd.tgz
+ [[ ! -f /chart/traefik-crd.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
chart path is a url, skipping repo update
+ echo 'chart path is a url, skipping repo update'
+ helm_v3 repo remove stable
Error: no repositories configured
+ true
+ return
+ helm_update install
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^traefik-crd$' --namespace kube-system --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
[storage/driver] 2022/09/01 22:06:24 list: failed to list: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER": dial tcp 10.43.0.1:443: i/o timeout
Error: Kubernetes cluster unreachable: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout
+ LINE=
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ [[ install = \d\e\l\e\t\e ]]
+ [[ '' =~ ^(|null)$ ]]
+ [[ '' =~ ^(|null)$ ]]
+ echo 'Installing helm_v3 chart'
+ helm_v3 install traefik-crd https://10.43.0.1:443/static/charts/traefik-crd-10.19.300.tgz
Error: INSTALLATION FAILED: failed to download "https://10.43.0.1:443/static/charts/traefik-crd-10.19.300.tgz"
Validated on k3s v1.25.0-rc2+k3s1
Environment Details
Infrastructure
- [x] Cloud
- [ ] Hosted
Node(s) CPU architecture, OS, and Version:
Linux ip-172-31-47-225 5.4.0-1078-aws #84~18.04.1-Ubuntu SMP Fri Jun 3 12:59:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
Cluster Configuration:
1 server, 1 agent
Testing Steps
- Install k3s:
For replication:
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.24 sh -s - --write-kubeconfig-mode 644 --cluster-init --token test
For validation:curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.25.0-rc2+k3s1 sh -s - --write-kubeconfig-mode 644 --cluster-init --token test
- Ensure cluster is up and running
Replication Results:
- k3s version used for replication:
$ k3s -v
k3s version v1.24.4+k3s1 (c3f830e9)
go version go1.18.1
- Check flannel version
$ sudo /var/lib/rancher/k3s/data/current/bin/flannel
CNI Plugin flannel version v0.19.1 (linux/amd64) commit c3f830e9b9ed8a4d9d0e2aa663b4591b923a296e built on 2022-08-25T03:45:26Z
- Search for default masquerade rules and received no results
$ sudo iptables -vnL -t nat |grep 'flanneld masq'
$
Validation Results:
- k3s version used for validation:
$ k3s -v
k3s version v1.25.0-rc2+k3s1 (1d42f46c)
go version go1.19
- Check flannel version
$ sudo /var/lib/rancher/k3s/data/current/bin/flannel
CNI Plugin flannel version v0.19.2 (linux/amd64) commit 1d42f46cda8b7603f43b4c3dcf1e8c13c5c6e014 built on 2022-09-07T20:23:36Z
- Search for default masquerade rules and observed results
$ sudo iptables -vnL -t nat |grep 'flanneld masq'
83 4980 RETURN all -- * * 10.42.0.0/16 10.42.0.0/16 /* flanneld masq */
1 45 MASQUERADE all -- * * 10.42.0.0/16 !224.0.0.0/4 /* flanneld masq */
0 0 RETURN all -- * * !10.42.0.0/16 10.42.0.0/24 /* flanneld masq */
0 0 MASQUERADE all -- * * !10.42.0.0/16 10.42.0.0/16 /* flanneld masq */
Thought I was going crazy.. First time I've tried K3S for real, same issue for me...
Ubuntu 18.04.6 LTS
$ k3s -v
k3s version v1.24.4+k3s1 (c3f830e9)
go version go1.18.1
$ sudo iptables -vnL -t nat |grep 'flanneld masq'
$
This should be fixed also on 1.24.4 maybe the tagged release is not updated with the newer commit. 1.24.5 should have the right version.
FWIW, https://get.k3s.io was pointing to v1.24.4+k3s1
by default as of yesterday evening for Ubuntu 18 and it definitely has the issue. When I manually set it to v1.25.0+k3s1
it started working fine.
I can confirm the issue is fixed in v1.24.5-rc1+k3s1
also. You might want to promote it to stable soon since the stable release currently points to v1.24.4+k3s1
QA is still validating the release candidate. We will cut a full release and eventually promote it to stable when validation is complete.
@rancher-max should we close this out