cilium icon indicating copy to clipboard operation
cilium copied to clipboard

nodeinit pods failing in 1.15.5

Open dlahn opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

What happened?

We have recently tried to upgrade to 1.15.5 and the latest pre-release, but our nodeinit pods are failing with the following error:

nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'

Reverting to 1.15.4 resolves the issue.

Cilium Version

1.15.5

Kernel Version

.

Kubernetes Version

v1.30.0-gke.145700

Regression

1.15.4

Sysdump

No response

Relevant log output

nsenter: cannot open /proc/1/ns/ipc: Permission denied
!!! startup-script failed! exit code '1'

Anything else?

No response

Cilium Users Document

  • [ ] Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

dlahn avatar May 22 '24 17:05 dlahn

Actually, I think it may have been introduced here: https://github.com/cilium/cilium/pull/31641/files#diff-0ea42ad21164b19bec1732225e254d3096d1e4040481c00053669287d81015fe

dlahn avatar May 22 '24 18:05 dlahn

Do you have any custom helm config related to the nodeinit pod?

lmb avatar May 23 '24 10:05 lmb

@lmb

  nodeinit:
    enabled: true
    reconfigureKubelet: true
    removeCbrBridge: true

dlahn avatar May 23 '24 15:05 dlahn

@dlahn can you provide the steps you used for both 1.15.4 and 1.15.5? Thank you

aanm avatar May 23 '24 15:05 aanm

@aanm I think it may have happened here, https://github.com/cilium/cilium/pull/31641/files#diff-0ea42ad21164b19bec1732225e254d3096d1e4040481c00053669287d81015fe, so I mispoke, and I think the last working verison was 1.15.3. If we simply upgrade the helm chart to the newest version, we receive these errors.

The only way to get 1.15.4 to work is to add this to the nodeinit section:

  nodeinit:
    enabled: true
    reconfigureKubelet: true
    removeCbrBridge: true
    image:
      tag: "62093c5c233ea914bfa26a10ba41f8780d9b737f"

However, this doesn't work in 1.15.5

dlahn avatar May 23 '24 16:05 dlahn

Any ideas here?

dlahn avatar May 29 '24 17:05 dlahn

Hi! I raised this issue on K8s Github https://github.com/kubernetes/kubernetes/issues/125069 I don't know if your error is related to that but I couldn't start Cilium either with version 1.15.5 because the pod annotations were removed and replaced by appArmorProfile type Unconfined. But the appArmorProfile Unconfined doesn't work for me with containerd. So if you also use containerd you can try to reput the annotations like on 1.15.4: container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined" container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined" container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined" container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"

jbmolle avatar Jun 04 '24 13:06 jbmolle

Adding these annotations seems to have resolved the issue for us.

dlahn avatar Jun 04 '24 18:06 dlahn

Hi! I raised this issue on K8s Github kubernetes/kubernetes#125069 I don't know if your error is related to that but I couldn't start Cilium either with version 1.15.5 because the pod annotations were removed and replaced by appArmorProfile type Unconfined. But the appArmorProfile Unconfined doesn't work for me with containerd. So if you also use containerd you can try to reput the annotations like on 1.15.4: container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined" container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined" container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined" container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"

I ended up with the same issue/solution, found it by doing a helm chart diff.

Which leads to the followup question of why containerd doesn't support the new profile type ? ..... Are you also running rancher RKE2?

danieljkemp avatar Jun 25 '24 13:06 danieljkemp

For me containerd is working fine in the end. The problem was coming from opentelemetry operator which has a mutating admission webhook and was removing the appArmorProfile key from the pod definition. They needed to update to Go 1.22 before getting the last schemas from K8s and accepting appArmorProfile key. For now I've removed openteletry operator and my cilium is working fine again. The fix on opentelemetry operator is done so we just need to wait for the next release and everything should be good. If you're not using opentelemetry operator, maybe you should check if you don't have other libs that are using a mutating admission webhook

jbmolle avatar Jun 25 '24 13:06 jbmolle

Just cert-manager and cnpg for those (and an istio-sidecar-injector despite having removed istio a while ago)

weirdly cilium did run fine after adding the annotations back? Haven't tried with a new cluster just yet.

danieljkemp avatar Jun 25 '24 13:06 danieljkemp

well cert-manager is not the problem. I'm using it too and the mutating webhook is not transforming the securityContext. If it's a mutating webhook problem it's normal that the annotations are working. K8s will look for eithter appArmorProfile in securityContext or annotations to enable the container to get the correct permissions for App Armor. If your webhook removes the appArmorProfile of the securityContext but not the annotations then K8s receives what it needs

jbmolle avatar Jun 25 '24 13:06 jbmolle

I don't know cnpg but a quick look shows that you have some references to appArmorProfile in releases/cnpg-1.23.2.yaml Those references are not there for versions older than 1.23.2 so maybe you don't use the last version?

jbmolle avatar Jun 25 '24 13:06 jbmolle

Yeah but that webhook only applies to cnpg postgress backup objects according to it's rules. Either way the created cilium pods end up with the apparmor context defined as expected in the running pod spec

danieljkemp avatar Jun 25 '24 13:06 danieljkemp

Fixed, see the solution in here

aanm avatar Jul 12 '24 21:07 aanm