calico icon indicating copy to clipboard operation
calico copied to clipboard

calico-node: Unable to chown /var/lib/calico/ on EKS when nonPrivileged is Enabled

Open aittam opened this issue 1 year ago • 5 comments

Following this guide https://docs.tigera.io/calico/latest/network-policy/non-privileged , adding nonPrivileged: Enabled to the Installation CR, the calico-node pods fail to start. On EKS, using AmazonLinux2 as AMI.

Expected Behavior

Using an Installation CR like the following:

spec:
  calicoNetwork:
    bgp: Disabled
    linuxDataplane: Iptables
  cni:
    ipam:
      type: AmazonVPC
    type: AmazonVPC
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  imagePath: my-calico
  imagePullSecrets:
  - name: mysecret
  kubeletVolumePluginPath: /var/lib/kubelet
  kubernetesProvider: EKS
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Enabled
  registry: myregistry.com/
  variant: Calico

We expect to see the calico-node pods to spin up correctly.

Current Behavior

The calico-node pods are all failing with:

2023-05-15 14:24:37.198 [PANIC][1] hostpath-init/hostpath_init.go 51: Unable to chown /var/lib/calico/
panic: (*logrus.Entry) 0xc00070c700

goroutine 1 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc00070c690, 0x0, {0xc0003869e0, 0x20})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc00070c690, 0x0, {0xc00087fe70?, 0x1c0004ebb94?, 0x49a7008?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Logger).Log(0xc00013e000, 0x0, {0xc00087fe70, 0x1, 0x1})
	/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:204 +0x65
github.com/sirupsen/logrus.(*Logger).Panic(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:253
github.com/sirupsen/logrus.Panic(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/exported.go:129
github.com/projectcalico/calico/node/pkg/hostpathinit.Run()
	/go/src/github.com/projectcalico/calico/node/pkg/hostpathinit/hostpath_init.go:51 +0x18f
main.main()
	/go/src/github.com/projectcalico/calico/node/cmd/calico-node/main.go:182 +0x47c

Possible Solution

Steps to Reproduce (for bugs)

Apply this installation CR on an EKS (v1.24.12-eks-ec5523e) with AmazonLinux as AMI:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    bgp: Disabled
    linuxDataplane: Iptables
  cni:
    ipam:
      type: AmazonVPC
    type: AmazonVPC
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  kubeletVolumePluginPath: /var/lib/kubelet
  kubernetesProvider: EKS
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Enabled
  variant: Calico

check the pods calico-node in calico-system.

Context

Running Calico with the least amount of privileges possible.

Your Environment

  • Calico version v3.25.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt): EKS v1.24.12-eks-ec5523e
  • Operating System and version: EKS Kubernetes Worker AMI with AmazonLinux2 image, (k8s: 1.24.11, docker: 20.10.17-1.amzn2.0.1, containerd: 1.6.*) ami-01594cdcf27eafc06
  • Link to your project (optional):

aittam avatar May 15 '23 14:05 aittam

Hey @aittam , thanks for raising this. I think we need to investigate this a bit more (unfortunately it looks like we did not have our error messaging write out the actual error so we'll need to change the code to capture that) in order to figure out why this doesn't work. I suspect the issue is that we do not have the right permissions but we'll need to dig deeper ourselves to really find out.

mgleung avatar May 16 '23 16:05 mgleung

To add more infos, I tried using Ubuntu 20.04.6 LTS, 5.15.0-1033-aws, as AMI and I got the same error:

2023-05-31 09:59:33.215 [PANIC][1] hostpath-init/hostpath_init.go 51: Unable to chown /var/lib/calico/
panic: (*logrus.Entry) 0xc0008d6930

goroutine 1 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc0008d68c0, 0x0, {0xc000682c80, 0x20})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x4a7
github.com/sirupsen/logrus.(*Entry).Log(0xc0008d68c0, 0x0, {0xc00097fe70?, 0x1c0004ebb94?, 0x49a7008?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Logger).Log(0xc0000d2000, 0x0, {0xc00097fe70, 0x1, 0x1})
	/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:204 +0x65
github.com/sirupsen/logrus.(*Logger).Panic(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:253
github.com/sirupsen/logrus.Panic(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/exported.go:129
github.com/projectcalico/calico/node/pkg/hostpathinit.Run()
	/go/src/github.com/projectcalico/calico/node/pkg/hostpathinit/hostpath_init.go:51 +0x18f
main.main()
	/go/src/github.com/projectcalico/calico/node/cmd/calico-node/main.go:182 +0x47c

aittam avatar May 31 '23 10:05 aittam

Got the exact same issue here. Running calico on EKS with amazonLinux2 AMI. It would also be nice to get more information in the docs of what is exactly done by enabling the option nonPrivileged.

Mohsen51 avatar Jun 29 '23 08:06 Mohsen51

Had the same error. I had to disable the flexVolume option by setting

        flexVolumePath: 'None'
        kubeletVolumePluginPath: 'None'

And then give the node ds more resources.

       componentResources:
        - componentName: Node
          resourceRequirements:
            limits:
              memory: 100Mi
            requests:
              cpu: 100m
              memory: 100Mi

Before I was able to get nonPrivileged: Enabled to work

On EKS, Calico 3.26.1, K8s 1.24/1.25, and Bottlerocket 1.13/1.14/1.15

Azahorscak avatar Oct 02 '23 14:10 Azahorscak

Any update on this?

pfrydids avatar Mar 19 '24 15:03 pfrydids