kubearmor-apparmor-cri-o pods not getting assigned to arm nodes
Hey all, after deploying kubearmor via the Getting Started guide (helm -> apply sample-config -> deploy nginx, all of the pods in all of my arm based nodes died and can't restart with a status of "CreateContainerError"
Describing the pods show the same issue:
kubelet Error: container create failed: write file /proc/thread-self/attr/apparmor/exec: No such file or directory
After confirming that apparmor is installed and working correctly on all of my nodes, I later deployed a new x64 node to swap out an arm node and discovered the same issue would happen there before the kubearmor-apparmor-cri-o pod got deployed on it -- checking my arm nodes, I discovered that none of those nodes have this pod assigned / running. No crash loop situation -- they're just not assigned.
I've started a thread here if that's at all helpful: https://cloud-native.slack.com/archives/C02R319HVL3/p1739059410548219
Hey @zestysoft , would you be able to provide more details on:
- k8s version used
- node distro used (
kubectl get nodes -o widewould work,kubectl describe nodeswould be great) - If possible, attach zip file generated by
karmor sysdump. Please note this zip file will contain detailed metadata of your pods. Please inspect and then send.
Based on the snitch logs you sent on the slack channel we can deduce following:
- Its a ARM raspberry pi running kernel 5.15.0-1061-raspi
- Following LSMs are available:
lockdown,capability,yama,apparmor
@daemon1024 @rksharma95 , the fastest way for us would be to setup and test the platform.
k8s: 1.30.8
I'll describe one of the nodes giving this trouble, but they should all be identical:
Name: k8s-node-4
Roles: <none>
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.ASIMD=true
feature.node.kubernetes.io/cpu-cpuid.CPUID=true
feature.node.kubernetes.io/cpu-cpuid.CRC32=true
feature.node.kubernetes.io/cpu-cpuid.EVTSTRM=true
feature.node.kubernetes.io/cpu-cpuid.FP=true
feature.node.kubernetes.io/cpu-hardware_multithreading=false
feature.node.kubernetes.io/cpu-model.family=15
feature.node.kubernetes.io/cpu-model.id=53379
feature.node.kubernetes.io/cpu-model.vendor_id=ARM
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
feature.node.kubernetes.io/kernel-config.PREEMPT=true
feature.node.kubernetes.io/kernel-version.full=5.15.0-1061-raspi
feature.node.kubernetes.io/kernel-version.major=5
feature.node.kubernetes.io/kernel-version.minor=15
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=ubuntu
feature.node.kubernetes.io/system-os_release.VERSION_ID=22.04
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=22
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
kubearmor.io/apparmorfs=yes
kubearmor.io/btf=no
kubearmor.io/enforcer=apparmor
kubearmor.io/rand=gqzg
kubearmor.io/runtime=cri-o
kubearmor.io/seccomp=yes
kubearmor.io/securityfs=yes
kubearmor.io/socket=run_crio_crio.sock
kubernetes.io/arch=arm64
kubernetes.io/hostname=k8s-node-4
kubernetes.io/os=linux
Annotations: csi.volume.kubernetes.io/nodeid:
{"rook-ceph.cephfs.csi.ceph.com":"k8s-node-4","rook-ceph.rbd.csi.ceph.com":"k8s-node-4","smb.csi.k8s.io":"k8s-node-4"}
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.ASIMD,cpu-cpuid.CPUID,cpu-cpuid.CRC32,cpu-cpuid.EVTSTRM,cpu-cpuid.FP,cpu-hardware_multithreading,cpu-model.family,cpu-model.id,c...
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 192.168.100.106/24
projectcalico.org/IPv4IPIPTunnelAddr: 172.17.55.128
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 23 Nov 2021 01:59:45 -0800
Taints: node.kubernetes.io/unschedulable:NoSchedule
Unschedulable: true
Lease:
HolderIdentity: k8s-node-4
AcquireTime: <unset>
RenewTime: Sun, 09 Feb 2025 19:19:19 -0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 24 Dec 2024 23:30:23 -0800 Tue, 24 Dec 2024 23:30:23 -0800 CalicoIsUp Calico is running on this node
MemoryPressure False Sun, 09 Feb 2025 19:19:21 -0800 Wed, 15 Jan 2025 20:46:02 -0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 09 Feb 2025 19:19:21 -0800 Wed, 15 Jan 2025 20:46:02 -0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 09 Feb 2025 19:19:21 -0800 Wed, 15 Jan 2025 20:46:02 -0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 09 Feb 2025 19:19:21 -0800 Wed, 15 Jan 2025 20:46:02 -0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.100.106
Hostname: k8s-node-4
Capacity:
cpu: 4
ephemeral-storage: 242868516Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 7995328Ki
pods: 110
squat.ai/audio: 2
squat.ai/capture: 0
squat.ai/fuse: 0
squat.ai/serial: 0
squat.ai/skyconnect: 0
squat.ai/video: 0
squat.ai/zigbee: 0
squat.ai/zwave: 0
Allocatable:
cpu: 4
ephemeral-storage: 223827623976
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 7892928Ki
pods: 110
squat.ai/audio: 2
squat.ai/capture: 0
squat.ai/fuse: 0
squat.ai/serial: 0
squat.ai/skyconnect: 0
squat.ai/video: 0
squat.ai/zigbee: 0
squat.ai/zwave: 0
System Info:
Machine ID: 83a199391c7f4601a960890310871b37
System UUID: 83a199391c7f4601a960890310871b37
Boot ID: 0102137e-3b75-45e0-9d62-6e73e8cc265a
Kernel Version: 5.15.0-1061-raspi
OS Image: Ubuntu 22.04.5 LTS
Operating System: linux
Architecture: arm64
Container Runtime Version: cri-o://1.29.11
Kubelet Version: v1.30.8
Kube-Proxy Version: v1.30.8
Non-terminated Pods: (15 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-mjsds 250m (6%) 0 (0%) 0 (0%) 0 (0%) 46d
kube-system csi-smb-node-vfdh8 30m (0%) 0 (0%) 60Mi (0%) 400Mi (5%) 14d
kube-system generic-device-plugin-dnrxj 50m (1%) 50m (1%) 20Mi (0%) 20Mi (0%) 510d
kube-system kube-proxy-whgcx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 46d
kube-system vpa-admission-controller-5bc54c84fc-tlxfc 50m (1%) 200m (5%) 200Mi (2%) 500Mi (6%) 46d
metallb-system speaker-khbvq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 27h
monitoring prometheus-prometheus-node-exporter-7xjv2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
nvidia-device-plugin nvdp-node-feature-discovery-worker-fxwpw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 21h
rook-ceph csi-cephfsplugin-k5jrw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
rook-ceph csi-rbdplugin-pvvhr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
rook-ceph rook-ceph-agent-tqr4l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
rook-ceph rook-ceph-crashcollector-k8s-node-4-6dbf98998d-bvmnm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
rook-ceph rook-ceph-exporter-k8s-node-4-665b57454b-jxs22 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
rook-ceph rook-ceph-osd-0-678966c959-8557q 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
security-profiles-operator spod-zv292 150m (3%) 0 (0%) 96Mi (1%) 256Mi (3%) 23h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 530m (13%) 250m (6%)
memory 376Mi (4%) 1176Mi (15%)
ephemeral-storage 60Mi (0%) 220Mi (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
hugepages-32Mi 0 (0%) 0 (0%)
hugepages-64Ki 0 (0%) 0 (0%)
squat.ai/audio 0 0
squat.ai/capture 0 0
squat.ai/fuse 0 0
squat.ai/serial 0 0
squat.ai/skyconnect 0 0
squat.ai/video 0 0
squat.ai/zigbee 0 0
squat.ai/zwave 0 0
Events: <none>
I'll follow up with the zip once the dump completes and I can sift through it.
the node has been cordoned, new pods will not be scheduled. right!
Taints: node.kubernetes.io/unschedulable:NoSchedule
Unschedulable: true
@zestysoft the problem seems to be
Taints: node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true
The node has a taint to make it unschedulable and the KubeArmor daemonset seems to be respecting that.
Can you drain this taint and check if KubeArmor gets scheduled?
That was done after to stop the scheduler from attempting to push to the arm nodes because of this.
Even if it's in a cordoned state, I would still expect to see the missing pod with a status of Pending right?
The node is uncordoned. I saw snitch run on it just now.
How long is the sysdump supposed to take? It has stopped here:
Checking all pods labeled with kubearmor-app
getting logs from pod=kubearmor-apparmor-cri-o-c5583-7gq46 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-8vpd5 container=kubearmor
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-bs8kg container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-csbkv container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-cw9xw container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-lvxvs container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-mljv4 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-xzjww container=kubearmor
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=kube-rbac-proxy
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=manager
getting logs from pod=kubearmor-operator-589f6fbd55-l47wl container=kubearmor-operator
getting logs from pod=kubearmor-relay-7f94fd7f4f-p48rj container=kubearmor-relay-server
Yeah, something's not working with the sysdump -- I tried it again -- if I hit enter after it pauses for over 5 minutes I get those EOF lines. Is there a timeout where it will progress with what it was able to grab?
Checking all pods labeled with kubearmor-app
getting logs from pod=kubearmor-apparmor-cri-o-c5583-7gq46 container=kubearmor
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-8vpd5 container=kubearmor
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
tar: removing leading '/' from member names
getting logs from pod=kubearmor-apparmor-cri-o-c5583-bs8kg container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-csbkv container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-cw9xw container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-lvxvs container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-mljv4 container=kubearmor
getting logs from pod=kubearmor-apparmor-cri-o-c5583-xzjww container=kubearmor
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=kube-rbac-proxy
getting logs from pod=kubearmor-controller-587d765dc4-qfhbm container=manager
getting logs from pod=kubearmor-operator-589f6fbd55-l47wl container=kubearmor-operator
getting logs from pod=kubearmor-relay-7f94fd7f4f-p48rj container=kubearmor-relay-server
getting logs from pod=kubearmor-snitch-dhsdz-jvfq4 container=snitch
getting logs from pod=kubearmor-snitch-pchtw-d9pkb container=snitch
getting logs from pod=kubearmor-snitch-t7cgn-hzz4h container=snitch
E0209 19:46:24.475729 41362 v2.go:104] EOF
E0209 19:46:31.719649 41362 v2.go:104] EOF
Also, correct me if I'm wrong, but I thought DaemonSets which the kubearmor-apparmor-cri-o pods come from, by design, don't honor the cordon state of a node. I just tested this with one of the working amd64 nodes -- I cordoned it, killed the kubearmor-apparmor-cri-o pod running on it, and it came back up without issue.
Any updates? Last I read this was a problem with containers running correctly on arm nodes (or possibly just raspberry pi arm nodes?)
In the meantime, is there a way to exempt those nodes so that they'll continue to function without kubearmor futzing with em?
@zestysoft we're planning it to get handled in v1.5.4 release. and to exempt those nodes you can make use of tolerations https://github.com/kubearmor/KubeArmor/blob/c933d8cebeda5ed48f0a015db5f0bd012ec7e006/deployments/helm/KubeArmorOperator/crds/operator.kubearmor.com_kubearmorconfigs.yaml#L337
Just FYI @rksharma95 I recently tried 1.5.4 but saw the same results.