talos icon indicating copy to clipboard operation
talos copied to clipboard

Cilium agents fail to start due to mount permissions with Cilium v1.12.0 (likely upstream issue)

Open twelho opened this issue 1 year ago • 12 comments

Bug Report

Description

I created a new cluster without CNI by adding --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]' to talosctl gen config.

After running talosctl bootstrap, deploying Cilium with Helm using

helm install cilium cilium/cilium --namespace kube-system --set ipam.mode=kubernetes --set kubeProxyReplacement=strict --set k8sServiceHost="master1.lan" --set k8sServicePort="6443"

results in Cilium initialization never completing. While the operators start up, all workers end up in CrashLoopBackOff trying to run the command

sh
-ec
cp /usr/bin/cilium-mount /hostbin/cilium-mount;
nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
rm /hostbin/cilium-mount

which results in

mount-cgroup nsenter: failed to execute /opt/cni/bin/cilium-mount: Permission denied

This is despite the file permissions looking to be correct:

$ talosctl list opt/cni/bin/ -l
NODE                               MODE         UID   GID   SIZE(B)   LASTMOD           NAME
master1.lan   drwxr-xr-x   0     0     26        Jul 20 13:38:19   .
master1.lan   -rwxr-xr-x   0     0     3424256   Jul 20 14:33:04   cilium-mount

So it seems like something else (namespaced mounts?) is blocking this. Deploying Cilium did work with Talos v1.0, but I haven't yet found the commit that broke the support. Let me know how I can debug this further or what other logs I can look at.

Update: Likely upstream issue due to insufficient privileges for running mount, can be worked around by passing --set securityContext.privileged=true to Helm (which restores the pre v1.12 behavior).

Environment

  • Talos version:
Client:
        Tag:         v1.1.1
        SHA:         40a050c6
        Built:       
        Go version:  go1.18.4
        OS/Arch:     linux/amd64
Server:
        NODE:        master1.lan
        Tag:         v1.2.0-alpha.0-43-g56a757cc8
        SHA:         56a757cc
        Built:       
        Go version:  go1.18.4
        OS/Arch:     linux/amd64
        Enabled:     RBAC
  • Kubernetes version:
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.24.2
  • Platform: Proxmox (nocloud)

twelho avatar Jul 20 '22 11:07 twelho

@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of securityContext.privileged from true to false, even though cilium add the SYS_ADMIN capability, it's not enough to do mount operations, you'd also need to set privileged: true for the pod securityContext. This can be fixed by adding --set securityContext.privileged=true while doing a helm install. The talos docs for cilium should still work as it's pinned to cilium version 1.11.2

frezbo avatar Jul 20 '22 12:07 frezbo

I'm also waiting on cilium/cilium to have an official 1.12.0 release and see if they something related to this in the release notes, it's still not updated yet.

frezbo avatar Jul 20 '22 12:07 frezbo

@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of securityContext.privileged from true to false, even though cilium add the SYS_ADMIN capability, it's not enough to do mount operations, you'd also need to set privileged: true for the pod securityContext. This can be fixed by adding --set securityContext.privileged=true while doing a helm install. The talos docs for cilium should still work as it's pinned to cilium version 1.11.2

Yes, I managed to figure that out as well, but didn't have time to respond :sweat_smile: In the v1.12.0 Helm chart the mount-cgroup init container securityContext has been changed to

securityContext:
  {{- if .Values.securityContext.privileged }}
  privileged: true
  {{- else }}
  seLinuxOptions:
    level: 's0'
    # Running with spc_t since we have removed the privileged mode.
    # Users can change it to a different type as long as they have the
    # type available on the system.
    type: 'spc_t'
  capabilities:
    drop:
      - ALL
    add:
      # Only used for 'mount' cgroup
      - SYS_ADMIN
      # Used for nsenter
      - SYS_CHROOT
      - SYS_PTRACE
  {{- end}}

which, like you said, doesn't give enough permissions to mount with the default securityContext.privileged=false. The old behavior is equal to securityContext.privileged=true, which works in Talos without issues.

Should this be considered a Cilium bug? v1.12.0 was technically released yesterday and is pointed to both by the docs and the Helm charts now, but the release is indeed missing from GitHub...

twelho avatar Jul 20 '22 13:07 twelho

Should this be considered a Cilium bug? v1.12.0 was technically released yesterday and is pointed to both by the docs and the Helm charts now, but the release is indeed missing from GitHub...

I assume it's a cilium bug, unless I'm missing some information. I was waiting on to see if someone else also reports it just to understand if we missed something on talos side. The v1.12.0 still doesn't have any release notes, so that's also another thing I'm waiting on

frezbo avatar Jul 20 '22 13:07 frezbo

Just for reference, in Cilium v1.11.7 the whole securityContext.privileged option is absent and the security context for the mount-cgroup container just states

securityContext:
  privileged: true

twelho avatar Jul 20 '22 13:07 twelho

As this escapes to the host namespace via nsenter, I would consider it to be evil already, so no more evil in privileged :)

What could be done interesting with Talos is repackaging all Cilium CNI plugins as a Talos system extension, and then probably it doesn't need to nsenter to the host at all (?).

smira avatar Jul 20 '22 13:07 smira

@twelho this is indeed a talos issue, the /opt directory is mounted without any permissions attached which causes the permission denied.

d---------   1 root root   17 Jul 20 21:43 opt

Will try to see what's the right fix needed, I wrongly assumed the cilium-mount binary needed all mount permissions

frezbo avatar Jul 20 '22 22:07 frezbo

Cilium still fails to start with the fix from #5953 since it requests SYS_MODULE capability which is blocked on talos for all processes except machined. The fix is to still run the pods as privileged.

frezbo avatar Jul 21 '22 20:07 frezbo

Created https://github.com/cilium/cilium/issues/20636 to track upstream

frezbo avatar Jul 22 '22 14:07 frezbo

Had a repro of the SYS_MODULE capability issue as of Cilium 1.12.1 (current).

The quick fix was to go into kubectl edit -n kube-system daemonset cilium and manually edit out both mentions of SYS_MODULE as listed above. After I had done this, the cilium agent daemonset deployed successfully.


The main symptom is the cilium DaemonSet pod in Init:CrashLoopBackOff state. Checking kubectl get pods -o yaml daemonset/cilium shows:

  - containerID: containerd://169ad8f01a02a9deed6be9dec819ad78a9fcdaa2f32d3df5dcd2077fee2533dc
    image: sha256:526bd4754c9cd45a9602873f814648239ebf8405ea2b401f5e7a3546f7310d88
    imageID: quay.io/cilium/cilium@sha256:ea2db1ee21b88127b5c18a96ad155c25485d0815a667ef77c2b7c7f31cab601b
    lastState:
      terminated:
        containerID: containerd://169ad8f01a02a9deed6be9dec819ad78a9fcdaa2f32d3df5dcd2077fee2533dc
        exitCode: 128
        finishedAt: "2022-08-30T10:03:57Z"
        message: 'failed to create containerd task: failed to create shim task: OCI
          runtime create failed: runc create failed: unable to start container process:
          unable to apply caps: operation not permitted: unknown'
        reason: StartError
        startedAt: "1970-01-01T00:00:00Z"
    name: clean-cilium-state
    ready: false
    restartCount: 4
    state:
      waiting:
        message: back-off 1m20s restarting failed container=clean-cilium-state pod=cilium-zcsp8_kube-system(71f770af-ce87-4b45-9996-8b6bd9f82365)
        reason: CrashLoopBackOff

That initContainer has these capabilities:

    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - SYS_MODULE
        - SYS_ADMIN
        - SYS_RESOURCE
        drop:
        - ALL

The SYS_MODULE culprit is defined in two places in the helm chart's cilium agent DaemonSet. The good news therefore is that this is mainly a configuration issue of the helm chart itself, as opposed to something that would need a code change:

nickbp avatar Aug 30 '22 10:08 nickbp

correct, it's the explicit requesting of SYS_MODULE capability that throws permission denied, since talos drops permission to load modules.

frezbo avatar Aug 30 '22 10:08 frezbo

I tried disabling the SYS_MODULE in the config, the deployment of Cilium went fine, however CoreDNS wasn't able to reach the clusterIP.

I did a lot of troubleshooting and I only could find some SYN packets being sent, it seems the node wasn't aware of the service-IPs. The service IP was pointing to the Node IP, which was reachable.

I ended op, by adding this to the cilium config and the cluster is running fine now.

securityContext:
  privileged: true

RobM83 avatar Sep 25 '22 13:09 RobM83

I am looking into getting cilium deployed (ontop of Talos cluster) ... so it might be just an matter of that we need to update Talos doc for deploying with Cilium CNI ?

( I will drop some notes here if would wander of before I make it to the end of getting it working )

I noticed that there is a new Cilium release i.e 1.13.0 => https://github.com/cilium/cilium/releases/tag/v1.13.0 Also issue with privileged being required seams to be solved => https://github.com/cilium/cilium/issues/20636 With a claim to work om Talos specific now => https://github.com/cilium/cilium/pull/21506#issuecomment-1265319556

Also this seams to be closed, but unsure if it actually is fixed (or might not be a problem any more ?) => https://github.com/cilium/cilium-cli/pull/635

Deploying with a patch.yaml (no CNI & no kube proxy)

cluster:
  proxy:
    disabled: true
  network:
    cni:
      name: "none"
helm repo add cilium https://helm.cilium.io/
helm repo update

helm install cilium cilium/cilium \
--version 1.13.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=poc.example.se \
--set k8sServicePort=6443

Causes a crash loop ... however after

kubectl edit daemonset.apps/cilium removing all 'SYS_MODULE'

will give me

# kubectl get all 
NAME                                                READY   STATUS    RESTARTS         AGE
pod/cilium-5q6wp                                    1/1     Running   0                88s
pod/cilium-65l4t                                    1/1     Running   0                88s
pod/cilium-7tgtn                                    1/1     Running   0                88s
pod/cilium-hs5qh                                    1/1     Running   0                88s
pod/cilium-l2xpt                                    1/1     Running   0                88s
pod/cilium-lsbtk                                    1/1     Running   0                87s
pod/cilium-operator-7fc78cbbdb-7nhvr                1/1     Running   64 (7m18s ago)   6h28m
pod/cilium-operator-7fc78cbbdb-kc47v                1/1     Running   53 (6m15s ago)   6h28m
pod/coredns-5597575654-c8lps                        0/1     Running   0                6h29m
pod/coredns-5597575654-hcs5z                        0/1     Running   0                6h29m
pod/kube-apiserver-talos-control-plane-0            1/1     Running   0                8m21s
pod/kube-apiserver-talos-control-plane-1            1/1     Running   0                4m30s
pod/kube-apiserver-talos-control-plane-2            1/1     Running   0                8m33s
pod/kube-controller-manager-talos-control-plane-0   1/1     Running   1 (8m38s ago)    8m21s
pod/kube-controller-manager-talos-control-plane-1   1/1     Running   1 (4m46s ago)    4m30s
pod/kube-controller-manager-talos-control-plane-2   1/1     Running   1 (8m34s ago)    8m32s
pod/kube-scheduler-talos-control-plane-0            1/1     Running   1 (8m38s ago)    8m22s
pod/kube-scheduler-talos-control-plane-1            1/1     Running   1 (4m46s ago)    4m30s
pod/kube-scheduler-talos-control-plane-2            1/1     Running   1 (8m34s ago)    8m32s

NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
service/hubble-peer   ClusterIP   10.96.135.108   <none>        443/TCP                  6h28m
service/kube-dns      ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   6h29m

NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/cilium   6         6         6       6            6           kubernetes.io/os=linux   6h28m

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cilium-operator   2/2     2            2           6h28m
deployment.apps/coredns           0/2     2            0           6h29m

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/cilium-operator-7fc78cbbdb   2         2         2       6h28m
replicaset.apps/coredns-5597575654           2         2         0       6h29m

senare avatar Feb 21 '23 13:02 senare

cilium is still broken for kube-proxyless installs, see: https://github.com/cilium/cilium/issues/21603

frezbo avatar Feb 21 '23 14:02 frezbo

cilium is still broken for kube-proxyless installs, see: cilium/cilium#21603

edit: with kube-proxyless and privileged set to false

frezbo avatar Feb 21 '23 14:02 frezbo

Is this still an active issue? I wasn't able to reproduce it using Cilium 1.14.3 with Talos 1.5.3. I tested both, Cilium Kube-Proxy replacement enabled and disabled.

At the same time securityContext.privileged=true, should not be required anymore because of https://github.com/cilium/cilium/pull/21506 (also see Talos' Cilium installation guide where explicitly configured Linux capabilities are used: https://www.talos.dev/v1.5/kubernetes-guides/network/deploying-cilium/#method-1-helm-install

PhilipSchmid avatar Oct 19 '23 14:10 PhilipSchmid

Yes, this is outdated, Talos runs an integration test with Cilium in both kube-proxy and kube-proxy-less modes.

smira avatar Dec 06 '23 17:12 smira