cilium
cilium copied to clipboard
nodeinit pods failing in 1.15.5
Is there an existing issue for this?
- [X] I have searched the existing issues
What happened?
We have recently tried to upgrade to 1.15.5 and the latest pre-release, but our nodeinit pods are failing with the following error:
nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'
Reverting to 1.15.4 resolves the issue.
Cilium Version
1.15.5
Kernel Version
.
Kubernetes Version
v1.30.0-gke.145700
Regression
1.15.4
Sysdump
No response
Relevant log output
nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'
Anything else?
No response
Cilium Users Document
- [ ] Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Actually, I think it may have been introduced here: https://github.com/cilium/cilium/pull/31641/files#diff-0ea42ad21164b19bec1732225e254d3096d1e4040481c00053669287d81015fe
Do you have any custom helm config related to the nodeinit pod?
@lmb
nodeinit:
enabled: true
reconfigureKubelet: true
removeCbrBridge: true
@dlahn can you provide the steps you used for both 1.15.4 and 1.15.5? Thank you
@aanm I think it may have happened here, https://github.com/cilium/cilium/pull/31641/files#diff-0ea42ad21164b19bec1732225e254d3096d1e4040481c00053669287d81015fe, so I mispoke, and I think the last working verison was 1.15.3. If we simply upgrade the helm chart to the newest version, we receive these errors.
The only way to get 1.15.4 to work is to add this to the nodeinit section:
nodeinit:
enabled: true
reconfigureKubelet: true
removeCbrBridge: true
image:
tag: "62093c5c233ea914bfa26a10ba41f8780d9b737f"
However, this doesn't work in 1.15.5
Any ideas here?
Hi! I raised this issue on K8s Github https://github.com/kubernetes/kubernetes/issues/125069 I don't know if your error is related to that but I couldn't start Cilium either with version 1.15.5 because the pod annotations were removed and replaced by appArmorProfile type Unconfined. But the appArmorProfile Unconfined doesn't work for me with containerd. So if you also use containerd you can try to reput the annotations like on 1.15.4: container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined" container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined" container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined" container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"
Adding these annotations seems to have resolved the issue for us.
Hi! I raised this issue on K8s Github kubernetes/kubernetes#125069 I don't know if your error is related to that but I couldn't start Cilium either with version 1.15.5 because the pod annotations were removed and replaced by appArmorProfile type Unconfined. But the appArmorProfile Unconfined doesn't work for me with containerd. So if you also use containerd you can try to reput the annotations like on 1.15.4: container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined" container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined" container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined" container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"
I ended up with the same issue/solution, found it by doing a helm chart diff.
Which leads to the followup question of why containerd doesn't support the new profile type ? ..... Are you also running rancher RKE2?
For me containerd is working fine in the end. The problem was coming from opentelemetry operator which has a mutating admission webhook and was removing the appArmorProfile key from the pod definition. They needed to update to Go 1.22 before getting the last schemas from K8s and accepting appArmorProfile key. For now I've removed openteletry operator and my cilium is working fine again. The fix on opentelemetry operator is done so we just need to wait for the next release and everything should be good. If you're not using opentelemetry operator, maybe you should check if you don't have other libs that are using a mutating admission webhook
Just cert-manager and cnpg for those (and an istio-sidecar-injector despite having removed istio a while ago)
weirdly cilium did run fine after adding the annotations back? Haven't tried with a new cluster just yet.
well cert-manager is not the problem. I'm using it too and the mutating webhook is not transforming the securityContext. If it's a mutating webhook problem it's normal that the annotations are working. K8s will look for eithter appArmorProfile in securityContext or annotations to enable the container to get the correct permissions for App Armor. If your webhook removes the appArmorProfile of the securityContext but not the annotations then K8s receives what it needs
I don't know cnpg but a quick look shows that you have some references to appArmorProfile in releases/cnpg-1.23.2.yaml Those references are not there for versions older than 1.23.2 so maybe you don't use the last version?
Yeah but that webhook only applies to cnpg postgress backup objects according to it's rules. Either way the created cilium pods end up with the apparmor context defined as expected in the running pod spec
Fixed, see the solution in here