talos
talos copied to clipboard
Cilium agents fail to start due to mount permissions with Cilium v1.12.0 (likely upstream issue)
Bug Report
Description
I created a new cluster without CNI by adding --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
to talosctl gen config
.
After running talosctl bootstrap
, deploying Cilium with Helm using
helm install cilium cilium/cilium --namespace kube-system --set ipam.mode=kubernetes --set kubeProxyReplacement=strict --set k8sServiceHost="master1.lan" --set k8sServicePort="6443"
results in Cilium initialization never completing. While the operators start up, all workers end up in CrashLoopBackOff
trying to run the command
sh
-ec
cp /usr/bin/cilium-mount /hostbin/cilium-mount;
nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
rm /hostbin/cilium-mount
which results in
mount-cgroup nsenter: failed to execute /opt/cni/bin/cilium-mount: Permission denied
This is despite the file permissions looking to be correct:
$ talosctl list opt/cni/bin/ -l
NODE MODE UID GID SIZE(B) LASTMOD NAME
master1.lan drwxr-xr-x 0 0 26 Jul 20 13:38:19 .
master1.lan -rwxr-xr-x 0 0 3424256 Jul 20 14:33:04 cilium-mount
So it seems like something else (namespaced mounts?) is blocking this. Deploying Cilium did work with Talos v1.0, but I haven't yet found the commit that broke the support. Let me know how I can debug this further or what other logs I can look at.
Update: Likely upstream issue due to insufficient privileges for running mount
, can be worked around by passing --set securityContext.privileged=true
to Helm (which restores the pre v1.12
behavior).
Environment
- Talos version:
Client:
Tag: v1.1.1
SHA: 40a050c6
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Server:
NODE: master1.lan
Tag: v1.2.0-alpha.0-43-g56a757cc8
SHA: 56a757cc
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Enabled: RBAC
- Kubernetes version:
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.24.2
- Platform: Proxmox (nocloud)
@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of securityContext.privileged
from true
to false
, even though cilium add the SYS_ADMIN
capability, it's not enough to do mount
operations, you'd also need to set privileged: true
for the pod securityContext. This can be fixed by adding --set securityContext.privileged=true
while doing a helm install. The talos docs for cilium should still work as it's pinned to cilium version 1.11.2
I'm also waiting on cilium/cilium to have an official 1.12.0 release and see if they something related to this in the release notes, it's still not updated yet.
@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of
securityContext.privileged
fromtrue
tofalse
, even though cilium add theSYS_ADMIN
capability, it's not enough to domount
operations, you'd also need to setprivileged: true
for the pod securityContext. This can be fixed by adding--set securityContext.privileged=true
while doing a helm install. The talos docs for cilium should still work as it's pinned to cilium version1.11.2
Yes, I managed to figure that out as well, but didn't have time to respond :sweat_smile: In the v1.12.0 Helm chart the mount-cgroup
init container securityContext
has been changed to
securityContext:
{{- if .Values.securityContext.privileged }}
privileged: true
{{- else }}
seLinuxOptions:
level: 's0'
# Running with spc_t since we have removed the privileged mode.
# Users can change it to a different type as long as they have the
# type available on the system.
type: 'spc_t'
capabilities:
drop:
- ALL
add:
# Only used for 'mount' cgroup
- SYS_ADMIN
# Used for nsenter
- SYS_CHROOT
- SYS_PTRACE
{{- end}}
which, like you said, doesn't give enough permissions to mount with the default securityContext.privileged=false
. The old behavior is equal to securityContext.privileged=true
, which works in Talos without issues.
Should this be considered a Cilium bug? v1.12.0
was technically released yesterday and is pointed to both by the docs and the Helm charts now, but the release is indeed missing from GitHub...
Should this be considered a Cilium bug? v1.12.0 was technically released yesterday and is pointed to both by the docs and the Helm charts now, but the release is indeed missing from GitHub...
I assume it's a cilium bug, unless I'm missing some information. I was waiting on to see if someone else also reports it just to understand if we missed something on talos side. The v1.12.0
still doesn't have any release notes, so that's also another thing I'm waiting on
Just for reference, in Cilium v1.11.7
the whole securityContext.privileged
option is absent and the security context for the mount-cgroup
container just states
securityContext:
privileged: true
As this escapes to the host namespace via nsenter
, I would consider it to be evil already, so no more evil in privileged
:)
What could be done interesting with Talos is repackaging all Cilium CNI plugins as a Talos system extension, and then probably it doesn't need to nsenter
to the host at all (?).
@twelho this is indeed a talos issue, the /opt
directory is mounted without any permissions attached which causes the permission denied.
d--------- 1 root root 17 Jul 20 21:43 opt
Will try to see what's the right fix needed, I wrongly assumed the cilium-mount
binary needed all mount permissions
Cilium still fails to start with the fix from #5953 since it requests SYS_MODULE
capability which is blocked on talos for all processes except machined. The fix is to still run the pods as privileged.
Created https://github.com/cilium/cilium/issues/20636 to track upstream
Had a repro of the SYS_MODULE
capability issue as of Cilium 1.12.1 (current).
The quick fix was to go into kubectl edit -n kube-system daemonset cilium
and manually edit out both mentions of SYS_MODULE
as listed above. After I had done this, the cilium agent daemonset deployed successfully.
The main symptom is the cilium
DaemonSet pod in Init:CrashLoopBackOff
state. Checking kubectl get pods -o yaml daemonset/cilium
shows:
- containerID: containerd://169ad8f01a02a9deed6be9dec819ad78a9fcdaa2f32d3df5dcd2077fee2533dc
image: sha256:526bd4754c9cd45a9602873f814648239ebf8405ea2b401f5e7a3546f7310d88
imageID: quay.io/cilium/cilium@sha256:ea2db1ee21b88127b5c18a96ad155c25485d0815a667ef77c2b7c7f31cab601b
lastState:
terminated:
containerID: containerd://169ad8f01a02a9deed6be9dec819ad78a9fcdaa2f32d3df5dcd2077fee2533dc
exitCode: 128
finishedAt: "2022-08-30T10:03:57Z"
message: 'failed to create containerd task: failed to create shim task: OCI
runtime create failed: runc create failed: unable to start container process:
unable to apply caps: operation not permitted: unknown'
reason: StartError
startedAt: "1970-01-01T00:00:00Z"
name: clean-cilium-state
ready: false
restartCount: 4
state:
waiting:
message: back-off 1m20s restarting failed container=clean-cilium-state pod=cilium-zcsp8_kube-system(71f770af-ce87-4b45-9996-8b6bd9f82365)
reason: CrashLoopBackOff
That initContainer has these capabilities:
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_MODULE
- SYS_ADMIN
- SYS_RESOURCE
drop:
- ALL
The SYS_MODULE
culprit is defined in two places in the helm chart's cilium agent DaemonSet. The good news therefore is that this is mainly a configuration issue of the helm chart itself, as opposed to something that would need a code change:
correct, it's the explicit requesting of SYS_MODULE
capability that throws permission denied, since talos drops permission to load modules.
I tried disabling the SYS_MODULE in the config, the deployment of Cilium went fine, however CoreDNS wasn't able to reach the clusterIP.
I did a lot of troubleshooting and I only could find some SYN packets being sent, it seems the node wasn't aware of the service-IPs. The service IP was pointing to the Node IP, which was reachable.
I ended op, by adding this to the cilium config and the cluster is running fine now.
securityContext:
privileged: true
I am looking into getting cilium deployed (ontop of Talos cluster) ... so it might be just an matter of that we need to update Talos doc for deploying with Cilium CNI ?
( I will drop some notes here if would wander of before I make it to the end of getting it working )
I noticed that there is a new Cilium release i.e 1.13.0 => https://github.com/cilium/cilium/releases/tag/v1.13.0 Also issue with privileged being required seams to be solved => https://github.com/cilium/cilium/issues/20636 With a claim to work om Talos specific now => https://github.com/cilium/cilium/pull/21506#issuecomment-1265319556
Also this seams to be closed, but unsure if it actually is fixed (or might not be a problem any more ?) => https://github.com/cilium/cilium-cli/pull/635
Deploying with a patch.yaml (no CNI & no kube proxy)
cluster:
proxy:
disabled: true
network:
cni:
name: "none"
helm repo add cilium https://helm.cilium.io/
helm repo update
helm install cilium cilium/cilium \
--version 1.13.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=poc.example.se \
--set k8sServicePort=6443
Causes a crash loop ... however after
kubectl edit daemonset.apps/cilium removing all 'SYS_MODULE'
will give me
# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/cilium-5q6wp 1/1 Running 0 88s
pod/cilium-65l4t 1/1 Running 0 88s
pod/cilium-7tgtn 1/1 Running 0 88s
pod/cilium-hs5qh 1/1 Running 0 88s
pod/cilium-l2xpt 1/1 Running 0 88s
pod/cilium-lsbtk 1/1 Running 0 87s
pod/cilium-operator-7fc78cbbdb-7nhvr 1/1 Running 64 (7m18s ago) 6h28m
pod/cilium-operator-7fc78cbbdb-kc47v 1/1 Running 53 (6m15s ago) 6h28m
pod/coredns-5597575654-c8lps 0/1 Running 0 6h29m
pod/coredns-5597575654-hcs5z 0/1 Running 0 6h29m
pod/kube-apiserver-talos-control-plane-0 1/1 Running 0 8m21s
pod/kube-apiserver-talos-control-plane-1 1/1 Running 0 4m30s
pod/kube-apiserver-talos-control-plane-2 1/1 Running 0 8m33s
pod/kube-controller-manager-talos-control-plane-0 1/1 Running 1 (8m38s ago) 8m21s
pod/kube-controller-manager-talos-control-plane-1 1/1 Running 1 (4m46s ago) 4m30s
pod/kube-controller-manager-talos-control-plane-2 1/1 Running 1 (8m34s ago) 8m32s
pod/kube-scheduler-talos-control-plane-0 1/1 Running 1 (8m38s ago) 8m22s
pod/kube-scheduler-talos-control-plane-1 1/1 Running 1 (4m46s ago) 4m30s
pod/kube-scheduler-talos-control-plane-2 1/1 Running 1 (8m34s ago) 8m32s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hubble-peer ClusterIP 10.96.135.108 <none> 443/TCP 6h28m
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6h29m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cilium 6 6 6 6 6 kubernetes.io/os=linux 6h28m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cilium-operator 2/2 2 2 6h28m
deployment.apps/coredns 0/2 2 0 6h29m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cilium-operator-7fc78cbbdb 2 2 2 6h28m
replicaset.apps/coredns-5597575654 2 2 0 6h29m
cilium is still broken for kube-proxyless installs, see: https://github.com/cilium/cilium/issues/21603
cilium is still broken for kube-proxyless installs, see: cilium/cilium#21603
edit: with kube-proxyless and privileged set to false
Is this still an active issue? I wasn't able to reproduce it using Cilium 1.14.3 with Talos 1.5.3. I tested both, Cilium Kube-Proxy replacement enabled and disabled.
At the same time securityContext.privileged=true
, should not be required anymore because of https://github.com/cilium/cilium/pull/21506 (also see Talos' Cilium installation guide where explicitly configured Linux capabilities are used: https://www.talos.dev/v1.5/kubernetes-guides/network/deploying-cilium/#method-1-helm-install
Yes, this is outdated, Talos runs an integration test with Cilium in both kube-proxy and kube-proxy-less modes.