adb-atomic-developer-bundle icon indicating copy to clipboard operation
adb-atomic-developer-bundle copied to clipboard

SELinux vs Kubernetes volumes

Open kadel opened this issue 9 years ago • 31 comments

SELinux inside ADB is blocking using emptyDir and hostPath volumes and passing secrets to containers as volumes

emptyDir and hostPath volumes

If you create pod with volume, where volume is either emptyDir or hostPath, SELinux is blocking access to that directory inside container.

example

busybox.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
  volumes:
    - name: storage
      hostPath:
        path: /tmp/storage 
kubectl create -f busybox.yaml
kubectl exec -it busybox /bin/sh
/ # touch /storage/asdf
touch: /storage/asdf: Permission denied

I have to manually change context of /tmp/storage on host to get that working

chcon -Rt svirt_sandbox_file_t /tmp/storage

Same issue is with emptyDir volumes. This situation is a little bit complicated because emptyDir volumes are created in path containing pod uid /var/lib/kubelet/pods/4dd7b77b-8228-11e5-b8db-525400e09276/volumes/kubernetes.io~empty-dir/

Kubernetes secrets

SELinux is also blocking access to secret volumes so you can not pass secrets from kubernetes to containers.

example:

secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
data:
  password: dmFsdWUtMg0K
  username: dmFsdWUtMQ0K

busybox.yaml

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
      - name: secret
        mountPath: /secret

  volumes:
    - name: storage
      hostPath:
        path: /tmp/storage 
    - name: secret
      secret:
        secretName: mysecret
kubectl  create  -f secret.yaml
kubectl  create  -f busybox.yaml

now you should be able to access /secret/username and /secret/password inside container

# kubectl exec -it busybox /bin/sh
# ls /secret/
ls: /secret/username: Permission denied
ls: /secret/password: Permission denied

to get this manually working is a little bit pain:

# first you need to get pod id
PODID=$(kubectl  get pod busybox -o template --template="{{ .metadata.uid }}")
# thank you can change security context of volume on host
chcon -Rt svirt_sandbox_file_t /var/lib/kubelet/pods/$PODID/volumes/kubernetes.io~secret
# kubectl exec -it busybox /bin/sh
# cat /secret/*
value-2
value-1

It is quite pain to work with emptyDir and secrets as volumes inside ADB, you have to always think of chcon. When you stop pod it is recreated with new uid, so you have to chcon secrets and emptyDir volumes again :-(

I don't know if this is Kubernetes problem or something with SELinux setup inside ADB.

I think that in hostDir case it is ok to require manual chcon, but with emptyDir and with secrets, that should be automatic.

My idea is to automatically label everything in /var/lib/kubelet/pods/*/volumes/ svirt_sandbox_file_t.

kadel avatar Nov 03 '15 14:11 kadel

My idea is to automatically label everything in /var/lib/kubelet/pods/*/volumes/ svirt_sandbox_file_t.

I think this can be handled by k8s SELinux rules (may be have those entry to spec file) but it's should be discussed with @rhatdan to understand if there any drawback.

praveenkumar avatar Dec 11 '15 06:12 praveenkumar

If you label everything in this directory with this label then (::svirt_sandbox_file_t:s0) then all containers will be able to read write all content there. If you use the "Z" when volume mounting into a container then docker will relabel the content to something that is private to the container.

rhatdan avatar Dec 11 '15 14:12 rhatdan

Are there any progress on this issue. This issue blocks using kubernetes with Atomic Host in production.

ediskandarov avatar Dec 25 '15 09:12 ediskandarov

Proposal is already available to kubernetes GitHub to make sure SELinux context (https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/selinux.md ) should be correct and it is used in OpenShift downstream work (https://github.com/kubernetes/kubernetes/pull/9844). PR say it's already merged. I will test latest branch and check if that something we want for ADB.

praveenkumar avatar Jan 11 '16 11:01 praveenkumar

I tried latest kubernetes (update-testing repo) in fedora-23 and still empty directory have permission issue.

# cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
  volumes:
    - name: storage
      emptyDir: {}

# kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          40s

# kubectl exec -it busybox /bin/sh
/ # cd /storage/
/storage # touch hello
touch: hello: Permission denied

But as per https://github.com/kubernetes/kubernetes/issues/15883 it should be resolved for fedora so I will discuss it with @pmorie

praveenkumar avatar Jan 12 '16 07:01 praveenkumar

I had discussion with @pmorie and looks like labeling /var/lib/kubectl/pods with ::svirt_sandbox_file_t:s0 will be good enough for this issue because latest version of k8s (https://github.com/kubernetes/kubernetes/commit/1d352a16b8e766eabe7ab75ebddc43f07a6fcdd0) have Z added for emptydir which relabel it to different context as @rhatdan said.

Now we have to make sure if k8s shipped by ADB have this fix or not.

praveenkumar avatar Jan 14 '16 06:01 praveenkumar

@praveenkumar ADB/CDK is based on CentOS/RHEL. So it needs to be fixed in RHEL and ADB/CDK rebuild will get it.

We should file a bug against RHEL and then can put a workaround in the kickstart file.

LalatenduMohanty avatar Jan 14 '16 13:01 LalatenduMohanty

@LalatenduMohanty yes I filled issue against RHEL https://bugzilla.redhat.com/show_bug.cgi?id=1298568

praveenkumar avatar Jan 14 '16 18:01 praveenkumar

Some more observation after checking if stock k8s has auto labeling of shared volumes but looks like it still lack of it.

Plain docker container

 # docker run -d -v /root/hello/:/strorage:rw,Z --name test docker.io/busybox /bin/sleep 36000
980411a0c1ee9c4689b4ecde78810c0bc712e14d88e04f205aa57a2fdb77e830
# ls -Zl .
total 8
-rw-------. 1 system_u:object_r:admin_home_t:s0 root root 4283 Apr  5 02:13 anaconda-ks.cfg
drwxr-xr-x. 2 system_u:object_r:svirt_sandbox_file_t:s0:c499,c672 root root    6 Apr  6 02:30 hello
# docker inspect test | grep -i mount
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c499,c672", 
   "Mounts": [
        {
            "Source": "/root/hello",
            "Destination": "/strorage",
            "Mode": "rw,Z",
            "RW": true
        }

k8s container

Using same template as I pointed above (used by @kadel also)

# kubectl create -f busybox.yaml
# kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   1          18h
# docker ps
7ccc35980a91        busybox                                                      "sleep 36000"        8 hours ago         Up 8 hours                              k8s_busybox.ed662816_busybox_default_76979d50-fb2b-11e5-8b91-52540099d993_a9d80ec0
d716252fea3e        registry.access.redhat.com/rhel7/pod-infrastructure:latest   "/pod"               18 hours ago        Up 18 hours                             k8s_POD.ae8ee9ac_busybox_default_76979d50-fb2b-11e5-8b91-52540099d993_c2e36ba7
# docker inspect test | grep -i mount
   "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c499,c672",
   "Mounts": [
        {
            "Source": "/var/lib/kubelet/pods/76979d50-fb2b-11e5-8b91-52540099d993/volumes/kubernetes.io~empty-dir/storage",
            "Destination": "/storage",
            "Mode": "",
            "RW": true
        },

# ls -Zl /var/lib/kubelet/pods/76979d50-fb2b-11e5-8b91-52540099d993/volumes/kubernetes.io~empty-dir
total 0
drwxrwxrwx. 2 system_u:object_r:svirt_sandbox_file_t:s0 root root 27 Apr  5 12:11 storage

Conclusion

So k8s does relabel it but only issue I am worried about Mount mode.

praveenkumar avatar Apr 06 '16 08:04 praveenkumar

@praveenkumar : Can you retry with following?

# cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
  volumes:
    - name: storage
      emptyDir: {}
      SELinuxRelabel: true

navidshaikh avatar Apr 06 '16 11:04 navidshaikh

@navidshaikh kubectl not able to parse SELinuxRelabel flag from this manifest file

# kubectl create -f busybox.yaml 
error validating "busybox.yaml": error validating data: found invalid field SELinuxRelabel for v1.Volume; if you choose to ignore these errors, turn validation off with --validate=false

praveenkumar avatar Apr 06 '16 12:04 praveenkumar

The Relabel is only going to happen the first time, not on each container start, so not having it in the inspect might be alright. Not sure how k8s does the relabel though.

rhatdan avatar Apr 06 '16 13:04 rhatdan

@rhatdan please let me understand then if we do below changes in our kickstart file which generate vagrant box to consume those service then would it be any issue ?

# chcon -Rt svirt_sandbox_file_t /var/lib/kubelet/pods

and then once boxes are build we can have access to mounted volumes

# kubectl create -f busybox.yaml

praveenkumar avatar Apr 06 '16 13:04 praveenkumar

If you do that labeling, that means that all pods/containers can read/write to all other pods content.

All container processes can read/write content with the svirt_sandbox_file_t type, if they have an MCS Label of S0. In order to get them isolated you want them to have different MCS labels like s0:c1,c2 versus s0:c3,c4. The docker daemon and RKT do this automatically. If you are volume mounting into the container then from a docker point of view, mounting using Z will cause docker to relabel the content to a private MCS type at container creation. If you stop and start the container without recreating it, the labeling will not happen again.

rhatdan avatar Apr 06 '16 18:04 rhatdan

@praveenkumar

So k8s does relabel it but only issue I am worried about Mount mode.

I'm not exactly sure what the question is here.

pmorie avatar Apr 06 '16 18:04 pmorie

All container processes can read/write content with the svirt_sandbox_file_t type, if they have an MCS Label of S0. In order to get them isolated you want them to have different MCS labels like s0:c1,c2 versus s0:c3,c4. The docker daemon and RKT do this automatically.

@rhatdan Yes agreed, I got the point that container process have r/w because of MCS label of s0 and I did experiment with docker by adding -Z to mount point did change MCS labels automatic. Now RKT might also doing it automatic as you said but our usecase is kubernetes and it doesn't have any selinux relabel parameter in pod manifest + not doing it automatic also (As per my experiment). Now based on what you said I suppose it is not a good practice to set SELinux-context directory wide since auto-labeling not happening here.

praveenkumar avatar Apr 07 '16 03:04 praveenkumar

I'm not exactly sure what the question is here.

@pmorie So I was wondering if there is something in pod manifest to define mode as parameter and then put value Z to that parameter but didn't get any such param.

praveenkumar avatar Apr 07 '16 03:04 praveenkumar

K8s should probably have a flag like private and shared in the volume label that could then be translated into :z and :Z within docker command.

rhatdan avatar Apr 07 '16 12:04 rhatdan

@rhatdan

K8s should probably have a flag like private and shared in the volume label that could then be translated into :z and :Z within docker command.

There's a community PR that adds mount propagation support now; I expect we'll have it in 1.3

@praveenkumar

@pmorie So I was wondering if there is something in pod manifest to define mode as parameter and then put value Z to that parameter but didn't get any such param.

Kubernetes does support relabeling emptyDir volumes, but you need to explicitly define the SELinux context in the pod spec. There are two places do it, a pod-level securityContext field in the pod spec, or a securityContext field in a container. In your case, I would recommend using the pod's security context, like so:

apiVersion: v1
kind: Pod
metadata:
  name: hello-world
spec:
  containers:
    - name: hello-world-container
      # The container definition
      # ...
      securityContext:
        privileged: true
        seLinuxOptions:
          level: "s0:c123,c456"

It's possible to specify the user, role, and type in this field, but you really only need to specify the level. Note, OpenShift automatically allocates a level for your pod and sets this piece of the pod spec for you.

See the docs for more information: http://kubernetes.io/docs/user-guide/security-context/

pmorie avatar Apr 07 '16 13:04 pmorie

@navidshaikh @praveenkumar

for the record, SELinuxRelabel is not a real thing in the Kubernetes API:

  volumes:
    - name: storage
      emptyDir: {}
      SELinuxRelabel: true

pmorie avatar Apr 07 '16 13:04 pmorie

@pmorie So I got some time today to test this out and getting Error from server: error when creating "busybox.yaml": pods "busybox" is forbidden: SecurityContext.SELinuxOptions is forbidden any idea what else we should configure to make it work?

$ cat busybox.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
    securityContext:
      privileged: true
      seLinuxOptions:
        level: "s0:c123,c456"
  volumes:
    - name: storage
      emptyDir: {}

praveenkumar avatar Apr 13 '16 11:04 praveenkumar

@praveenkumar Try removing SecurityContextDeny from KUBE_ADMISSION_CONTROL= in /etc/kubernetes/apiserver

I've been poking at this in the kubernetes/contrib ansible scripts context for a little bit. I wrote up some notes on what I've found here.

jasonbrooks avatar Apr 19 '16 19:04 jasonbrooks

@praveenkumar Try removing SecurityContextDeny from KUBE_ADMISSION_CONTROL= in /etc/kubernetes/apiserver

@jasonbrooks If I remove "SecurityContextDeny" then it will disable the SELinux context for pods which something I don't want.

I've been poking at this in the kubernetes/contrib ansible scripts context for a little bit. I wrote up some notes on what I've found here.

That's helpful I also enabled sudo chcon -Rt svirt_sandbox_file_t /var/lib/kubelet for time being but then relabeling not happening right now.

praveenkumar avatar Apr 22 '16 05:04 praveenkumar

@praveenkumar Do you mean that removing SecurityContextDeny would allow a user to disable the SELinux context? I'm looking at the ADB now, with the setting changed, and the processes from two different pods are still being labeled differently. Maybe I'm looking at the wrong thing. How are you confirming whether selinux context is disabled? @pmorie

[vagrant@centos7-adb ~]$ ps axZ | grep nginx

system_u:system_r:svirt_lxc_net_t:s0:c376,c897 13763 ? Ss   0:00 nginx: master process nginx -g daemon off;
system_u:system_r:svirt_lxc_net_t:s0:c376,c897 13768 ? S   0:00 nginx: worker process

system_u:system_r:svirt_lxc_net_t:s0:c380,c695 14066 ? Ss   0:00 nginx: master process nginx -g daemon off;
system_u:system_r:svirt_lxc_net_t:s0:c380,c695 14074 ? S   0:00 nginx: worker process

jasonbrooks avatar Apr 22 '16 17:04 jasonbrooks

@praveenkumar, @jasonbrooks is right -- SecurityContextDeny is an admission controller that rejects pods that try to use security context. So, remove that from your list.

Btw, apologies for missing the notification on this one, @praveenkumar

pmorie avatar Apr 22 '16 17:04 pmorie

@pmorie So as per our yesterday IRC conversation I made modification as you said and below is result.

Step-1: Volume dir has selinux type svirt_sandbox_file_t

# ls -Zl /var/lib/ | grep kubelet
drwxr-xr-x. 4 system_u:object_r:svirt_sandbox_file_t:s0 root    root      31 Apr 22 00:40 kubelet

Step-2: Cluster does not use the 'SecurityContextDeny' admission controller

cat /etc/kubernetes/apiserver | grep KUBE_ADMISSION_CONTROL
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota"

Step-3: Explicitly set the parts of the selinux context that you want to be set, in the pod's security context

$ cat busybox.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
    securityContext:
      seLinuxOptions:
        level: "s0:c123,c456"
  volumes:
    - name: storage
      emptyDir: {}

# docker inspect k8s_busybox.cd717296_busybox_default_05309069-0bca-11e6-9fd9-525400ed468d_3061e077  | grep -i mount
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c123,c456",

#  ls -Zl /var/lib/kubelet/pods/05309069-0bca-11e6-9fd9-525400ed468d/volumes/
total 0
drwxr-xr-x. 3 system_u:object_r:svirt_sandbox_file_t:s0 root root 20 Apr 26 12:14 kubernetes.io~empty-dir
drwxr-xr-x. 3 system_u:object_r:svirt_sandbox_file_t:s0 root root 32 Apr 26 12:14 kubernetes.io~secret

I can still see it is not relabeled, is that expected?

praveenkumar avatar Apr 27 '16 08:04 praveenkumar

So yesterday I again had conversation with @pmorie and we able to create a workaround steps for it.

Step-1: Volume dir has selinux type svirt_sandbox_file_t

# ls -Zl /var/lib/ | grep kubelet
drwxr-xr-x. 4 system_u:object_r:svirt_sandbox_file_t:s0 root    root      31 Apr 22 00:40 kubelet

Step-2: Cluster does not need to use the 'SecurityContextDeny' admission controller

cat /etc/kubernetes/apiserver | grep KUBE_ADMISSION_CONTROL
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota"

Step-3: Explicitly set the parts of the selinux context that you want to be set, in the pod's security context

$ cat busybox.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    name: busybox
    command:
      - sleep
      - "36000"
    volumeMounts:
      - name: storage
        mountPath: /storage
  securityContext:
    seLinuxOptions:
      level: "s0:c123,c456"
  volumes:
    - name: storage
      emptyDir: {}

# docker inspect k8s_busybox.cd717296_busybox_default_05309069-0bca-11e6-9fd9-525400ed468d_3061e077  | grep -i mount
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c123,c456",

#  ls -Zl /var/lib/kubelet/pods/4f5704d4-0c8b-11e6-9056-5254006f8934/volumes/kubernetes.io~empty-dir/
total 0
drwxrwxrwx. 2 system_u:object_r:svirt_sandbox_file_t:s0:c123,c456 root root 6 Apr 27 11:18 storage

We need to put those in documents so user can refer.

praveenkumar avatar Apr 28 '16 05:04 praveenkumar

@LalatenduMohanty, @kadel Since we have a workaround for this issue and it should be documented.

praveenkumar avatar Jun 03 '16 06:06 praveenkumar

@praveenkumar I want to fix it before June 24th 2016 (most likely in vagrant-service-manager). Assigning it to myself.

LalatenduMohanty avatar Jun 03 '16 06:06 LalatenduMohanty

@LalatenduMohanty this is mostly document side their is very small change required from kickstart file side and it should be done before 7June so can be part of 2.1 release.

praveenkumar avatar Jun 03 '16 09:06 praveenkumar