KubeArmor icon indicating copy to clipboard operation
KubeArmor copied to clipboard

kubearmor-apparmor-cri-o pods not getting assigned to arm nodes

Open zestysoft opened this issue 1 year ago • 12 comments

Hey all, after deploying kubearmor via the Getting Started guide (helm -> apply sample-config -> deploy nginx, all of the pods in all of my arm based nodes died and can't restart with a status of "CreateContainerError"

Describing the pods show the same issue:

kubelet Error: container create failed: write file /proc/thread-self/attr/apparmor/exec: No such file or directory

After confirming that apparmor is installed and working correctly on all of my nodes, I later deployed a new x64 node to swap out an arm node and discovered the same issue would happen there before the kubearmor-apparmor-cri-o pod got deployed on it -- checking my arm nodes, I discovered that none of those nodes have this pod assigned / running. No crash loop situation -- they're just not assigned.

I've started a thread here if that's at all helpful: https://cloud-native.slack.com/archives/C02R319HVL3/p1739059410548219

zestysoft avatar Feb 09 '25 06:02 zestysoft

Hey @zestysoft , would you be able to provide more details on:

  1. k8s version used
  2. node distro used (kubectl get nodes -o wide would work, kubectl describe nodes would be great)
  3. If possible, attach zip file generated by karmor sysdump. Please note this zip file will contain detailed metadata of your pods. Please inspect and then send.

Based on the snitch logs you sent on the slack channel we can deduce following:

  1. Its a ARM raspberry pi running kernel 5.15.0-1061-raspi
  2. Following LSMs are available: lockdown,capability,yama,apparmor

@daemon1024 @rksharma95 , the fastest way for us would be to setup and test the platform.

nyrahul avatar Feb 10 '25 03:02 nyrahul

k8s: 1.30.8

I'll describe one of the nodes giving this trouble, but they should all be identical:

Name:               k8s-node-4
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    feature.node.kubernetes.io/cpu-cpuid.ASIMD=true
                    feature.node.kubernetes.io/cpu-cpuid.CPUID=true
                    feature.node.kubernetes.io/cpu-cpuid.CRC32=true
                    feature.node.kubernetes.io/cpu-cpuid.EVTSTRM=true
                    feature.node.kubernetes.io/cpu-cpuid.FP=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=false
                    feature.node.kubernetes.io/cpu-model.family=15
                    feature.node.kubernetes.io/cpu-model.id=53379
                    feature.node.kubernetes.io/cpu-model.vendor_id=ARM
                    feature.node.kubernetes.io/kernel-config.NO_HZ=true
                    feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
                    feature.node.kubernetes.io/kernel-config.PREEMPT=true
                    feature.node.kubernetes.io/kernel-version.full=5.15.0-1061-raspi
                    feature.node.kubernetes.io/kernel-version.major=5
                    feature.node.kubernetes.io/kernel-version.minor=15
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=ubuntu
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=22.04
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=22
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
                    kubearmor.io/apparmorfs=yes
                    kubearmor.io/btf=no
                    kubearmor.io/enforcer=apparmor
                    kubearmor.io/rand=gqzg
                    kubearmor.io/runtime=cri-o
                    kubearmor.io/seccomp=yes
                    kubearmor.io/securityfs=yes
                    kubearmor.io/socket=run_crio_crio.sock
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=k8s-node-4
                    kubernetes.io/os=linux
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"rook-ceph.cephfs.csi.ceph.com":"k8s-node-4","rook-ceph.rbd.csi.ceph.com":"k8s-node-4","smb.csi.k8s.io":"k8s-node-4"}
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
                    nfd.node.kubernetes.io/feature-labels:
                      cpu-cpuid.ASIMD,cpu-cpuid.CPUID,cpu-cpuid.CRC32,cpu-cpuid.EVTSTRM,cpu-cpuid.FP,cpu-hardware_multithreading,cpu-model.family,cpu-model.id,c...
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.100.106/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 172.17.55.128
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 23 Nov 2021 01:59:45 -0800
Taints:             node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  k8s-node-4
  AcquireTime:     <unset>
  RenewTime:       Sun, 09 Feb 2025 19:19:19 -0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 24 Dec 2024 23:30:23 -0800   Tue, 24 Dec 2024 23:30:23 -0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Sun, 09 Feb 2025 19:19:21 -0800   Wed, 15 Jan 2025 20:46:02 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sun, 09 Feb 2025 19:19:21 -0800   Wed, 15 Jan 2025 20:46:02 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sun, 09 Feb 2025 19:19:21 -0800   Wed, 15 Jan 2025 20:46:02 -0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sun, 09 Feb 2025 19:19:21 -0800   Wed, 15 Jan 2025 20:46:02 -0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.100.106
  Hostname:    k8s-node-4
Capacity:
  cpu:                  4
  ephemeral-storage:    242868516Ki
  hugepages-1Gi:        0
  hugepages-2Mi:        0
  hugepages-32Mi:       0
  hugepages-64Ki:       0
  memory:               7995328Ki
  pods:                 110
  squat.ai/audio:       2
  squat.ai/capture:     0
  squat.ai/fuse:        0
  squat.ai/serial:      0
  squat.ai/skyconnect:  0
  squat.ai/video:       0
  squat.ai/zigbee:      0
  squat.ai/zwave:       0
Allocatable:
  cpu:                  4
  ephemeral-storage:    223827623976
  hugepages-1Gi:        0
  hugepages-2Mi:        0
  hugepages-32Mi:       0
  hugepages-64Ki:       0
  memory:               7892928Ki
  pods:                 110
  squat.ai/audio:       2
  squat.ai/capture:     0
  squat.ai/fuse:        0
  squat.ai/serial:      0
  squat.ai/skyconnect:  0
  squat.ai/video:       0
  squat.ai/zigbee:      0
  squat.ai/zwave:       0
System Info:
  Machine ID:                 83a199391c7f4601a960890310871b37
  System UUID:                83a199391c7f4601a960890310871b37
  Boot ID:                    0102137e-3b75-45e0-9d62-6e73e8cc265a
  Kernel Version:             5.15.0-1061-raspi
  OS Image:                   Ubuntu 22.04.5 LTS
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  cri-o://1.29.11
  Kubelet Version:            v1.30.8
  Kube-Proxy Version:         v1.30.8
Non-terminated Pods:          (15 in total)
  Namespace                   Name                                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                    ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-mjsds                                       250m (6%)     0 (0%)      0 (0%)           0 (0%)         46d
  kube-system                 csi-smb-node-vfdh8                                      30m (0%)      0 (0%)      60Mi (0%)        400Mi (5%)     14d
  kube-system                 generic-device-plugin-dnrxj                             50m (1%)      50m (1%)    20Mi (0%)        20Mi (0%)      510d
  kube-system                 kube-proxy-whgcx                                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         46d
  kube-system                 vpa-admission-controller-5bc54c84fc-tlxfc               50m (1%)      200m (5%)   200Mi (2%)       500Mi (6%)     46d
  metallb-system              speaker-khbvq                                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         27h
  monitoring                  prometheus-prometheus-node-exporter-7xjv2               0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  nvidia-device-plugin        nvdp-node-feature-discovery-worker-fxwpw                0 (0%)        0 (0%)      0 (0%)           0 (0%)         21h
  rook-ceph                   csi-cephfsplugin-k5jrw                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  rook-ceph                   csi-rbdplugin-pvvhr                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  rook-ceph                   rook-ceph-agent-tqr4l                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  rook-ceph                   rook-ceph-crashcollector-k8s-node-4-6dbf98998d-bvmnm    0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  rook-ceph                   rook-ceph-exporter-k8s-node-4-665b57454b-jxs22          0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  rook-ceph                   rook-ceph-osd-0-678966c959-8557q                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         28h
  security-profiles-operator  spod-zv292                                              150m (3%)     0 (0%)      96Mi (1%)        256Mi (3%)     23h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource             Requests    Limits
  --------             --------    ------
  cpu                  530m (13%)  250m (6%)
  memory               376Mi (4%)  1176Mi (15%)
  ephemeral-storage    60Mi (0%)   220Mi (0%)
  hugepages-1Gi        0 (0%)      0 (0%)
  hugepages-2Mi        0 (0%)      0 (0%)
  hugepages-32Mi       0 (0%)      0 (0%)
  hugepages-64Ki       0 (0%)      0 (0%)
  squat.ai/audio       0           0
  squat.ai/capture     0           0
  squat.ai/fuse        0           0
  squat.ai/serial      0           0
  squat.ai/skyconnect  0           0
  squat.ai/video       0           0
  squat.ai/zigbee      0           0
  squat.ai/zwave       0           0
Events:                <none>

I'll follow up with the zip once the dump completes and I can sift through it.

zestysoft avatar Feb 10 '25 03:02 zestysoft

the node has been cordoned, new pods will not be scheduled. right!

Taints:             node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true

rksharma95 avatar Feb 10 '25 03:02 rksharma95

@zestysoft the problem seems to be

Taints: node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true

The node has a taint to make it unschedulable and the KubeArmor daemonset seems to be respecting that.

Can you drain this taint and check if KubeArmor gets scheduled?

daemon1024 avatar Feb 10 '25 03:02 daemon1024

That was done after to stop the scheduler from attempting to push to the arm nodes because of this.

zestysoft avatar Feb 10 '25 03:02 zestysoft

Even if it's in a cordoned state, I would still expect to see the missing pod with a status of Pending right?

zestysoft avatar Feb 10 '25 03:02 zestysoft

The node is uncordoned. I saw snitch run on it just now.

How long is the sysdump supposed to take? It has stopped here:

Checking all pods labeled with kubearmor-app
getting logs from pod=kubearmor-apparmor-cri-o-c5583-7gq46 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-8vpd5 container=kubearmor
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-bs8kg container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-csbkv container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-cw9xw container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-lvxvs container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-mljv4 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-xzjww container=kubearmor
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=kube-rbac-proxy
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=manager
getting logs from pod=kubearmor-operator-589f6fbd55-l47wl container=kubearmor-operator
getting logs from pod=kubearmor-relay-7f94fd7f4f-p48rj container=kubearmor-relay-server

zestysoft avatar Feb 10 '25 03:02 zestysoft

Yeah, something's not working with the sysdump -- I tried it again -- if I hit enter after it pauses for over 5 minutes I get those EOF lines. Is there a timeout where it will progress with what it was able to grab?

Checking all pods labeled with kubearmor-app
getting logs from pod=kubearmor-apparmor-cri-o-c5583-7gq46 container=kubearmor
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-8vpd5 container=kubearmor
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-bs8kg container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-csbkv container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-cw9xw container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-lvxvs container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-mljv4 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-xzjww container=kubearmor
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=kube-rbac-proxy
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=manager
getting logs from pod=kubearmor-operator-589f6fbd55-l47wl container=kubearmor-operator
getting logs from pod=kubearmor-relay-7f94fd7f4f-p48rj container=kubearmor-relay-server
getting logs from pod=kubearmor-snitch-dhsdz-jvfq4 container=snitch
getting logs from pod=kubearmor-snitch-pchtw-d9pkb container=snitch
getting logs from pod=kubearmor-snitch-t7cgn-hzz4h container=snitch

E0209 19:46:24.475729   41362 v2.go:104] EOF

E0209 19:46:31.719649   41362 v2.go:104] EOF

zestysoft avatar Feb 10 '25 03:02 zestysoft

Also, correct me if I'm wrong, but I thought DaemonSets which the kubearmor-apparmor-cri-o pods come from, by design, don't honor the cordon state of a node. I just tested this with one of the working amd64 nodes -- I cordoned it, killed the kubearmor-apparmor-cri-o pod running on it, and it came back up without issue.

zestysoft avatar Feb 10 '25 04:02 zestysoft

Any updates? Last I read this was a problem with containers running correctly on arm nodes (or possibly just raspberry pi arm nodes?)

In the meantime, is there a way to exempt those nodes so that they'll continue to function without kubearmor futzing with em?

zestysoft avatar Feb 19 '25 06:02 zestysoft

@zestysoft we're planning it to get handled in v1.5.4 release. and to exempt those nodes you can make use of tolerations https://github.com/kubearmor/KubeArmor/blob/c933d8cebeda5ed48f0a015db5f0bd012ec7e006/deployments/helm/KubeArmorOperator/crds/operator.kubearmor.com_kubearmorconfigs.yaml#L337

rksharma95 avatar Feb 19 '25 07:02 rksharma95

Just FYI @rksharma95 I recently tried 1.5.4 but saw the same results.

zestysoft avatar Apr 05 '25 03:04 zestysoft