sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

k8s: root cannot chown in emptyDir volume mount

Open iamnoah opened this issue 3 years ago • 5 comments

Maybe this is expected, but I'm facing workloads that are failing under sysbox because root can't change the owner of files and directories in a shared volume mount.

Create my pod:

apiVersion: v1
kind: Pod
metadata:
  name: sysbox-test
  namespace: default
  annotations:
    io.kubernetes.cri-o.userns-mode: "auto:size=65536"
spec:
  runtimeClassName: sysbox-runc
  containers:
    - name: ubu-bio-systemd-docker
      image: registry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
      command: ["/sbin/init"]
      volumeMounts:
        - mountPath: /tmp/share
          name: share
    - name: foo
      image: ubuntu
      command: ["/bin/sleep", "180"]
      securityContext:
        runAsUser: 999
        runAsGroup: 999
      volumeMounts:
        - mountPath: /tmp/share
          name: share
  securityContext:
    fsGroup: 999

  restartPolicy: Always
  volumes:
    - name: share
      emptyDir:
        medium: Memory

Then from the shell on the system container:

kubectl exec -it sysbox-test -- /bin/bash

root@sysbox-test:/# mkdir /tmp/share/foo
root@sysbox-test:/# ls -lhd /tmp/share/foo/
drwxr-sr-x 2 root nogroup 40 May 25 18:20 /tmp/share/foo/
root@sysbox-test:/# chown 999:999 /tmp/share/foo/
chown: changing ownership of '/tmp/share/foo/': Operation not permitted

I can workaround, but this works without sysbox.

iamnoah avatar May 25 '22 18:05 iamnoah

Hi @iamnoah, thanks for giving Sysbox a shot.

I suspect the fsgroup section is causing the problem; could you try without it?

That is, remove this:

  securityContext:
    fsGroup: 999

Thanks!

ctalledo avatar May 26 '22 13:05 ctalledo

@ctalledo that does allow root to chown, but of course the volume no longer has setgid, and is not owned by gid 999.

iamnoah avatar May 26 '22 13:05 iamnoah

Thanks @iamnoah; let me repro on my end and get back to you a bit later. I am out-of-office right now so response may be a bit delayed.

ctalledo avatar May 26 '22 15:05 ctalledo

Hi @iamnoah, using the same yaml you provided, things work fine for me with the latest sysbox-deploy-k8s (v0.5.2):

root@sysbox-test:/tmp/share# mkdir /tmp/share/foo
root@sysbox-test:/tmp/share# l
total 0
drwxr-sr-x 2 root docker 40 May 27 16:00 foo
root@sysbox-test:/tmp/share# chown 999:999 /tmp/share/foo/
root@sysbox-test:/tmp/share# l
total 0
drwxr-sr-x 2 999 docker 40 May 27 16:00 foo
root@sysbox-test:/tmp/share# l -n 
total 0
drwxr-sr-x 2 999 999 40 May 27 16:00 foo

Did you use that same version (kubectl -n kube-system describe <sysbox-deploy-k8s-pod-name> will show you); if not, could you try with it?

ctalledo avatar May 27 '22 16:05 ctalledo

I was on 0.5.1 and tried bumping to 0.5.2 but get the same results. Resulting node describe:

Name:               ip-10-0-146-171.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m4.large
                    beta.kubernetes.io/os=linux
                    crio-runtime=running
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=terraform-20220523214611433500000001
                    eks.amazonaws.com/nodegroup-image=ami-00f8662da7d2ffc72
                    eks.amazonaws.com/sourceLaunchTemplateId=lt-0175e4cc4d15837ed
                    eks.amazonaws.com/sourceLaunchTemplateVersion=2
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-146-171
                    kubernetes.io/os=linux
                    node-group-ami-type=ubuntu
                    node.kubernetes.io/instance-type=m4.large
                    sandbox-pods-using=sysbox
                    sysbox-install=yes
                    sysbox-runtime=running
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
                    vpc.amazonaws.com/has-trunk-attached=false
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 31 May 2022 11:53:47 -0500
Taints:             ReservedForSandboxedPod=sysbox:NoSchedule
Unschedulable:      false
...
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  KernelDeadlock       False   Tue, 31 May 2022 12:28:39 -0500   Tue, 31 May 2022 11:58:32 -0500   KernelHasNoDeadlock          kernel has no deadlock
  ReadonlyFilesystem   False   Tue, 31 May 2022 12:28:39 -0500   Tue, 31 May 2022 11:58:32 -0500   FilesystemIsNotReadOnly      Filesystem is not read-only
  MemoryPressure       False   Tue, 31 May 2022 12:26:34 -0500   Tue, 31 May 2022 11:53:47 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 31 May 2022 12:26:34 -0500   Tue, 31 May 2022 11:53:47 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 31 May 2022 12:26:34 -0500   Tue, 31 May 2022 11:53:47 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 31 May 2022 12:26:34 -0500   Tue, 31 May 2022 11:55:07 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
 ...
Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           20263484Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      8139476Ki
  pods:                        20
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         1930m
  ephemeral-storage:           17601085k
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7550676Ki
  pods:                        20
System Info:
  ...
  Kernel Version:             5.13.0-1023-aws
  OS Image:                   Ubuntu 20.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  cri-o://1.21.7
  Kubelet Version:            v1.21.9
  Kube-Proxy Version:         v1.21.9

Do CRI-O and kubelet need to match for some reason?

iamnoah avatar May 31 '22 17:05 iamnoah