adb-atomic-developer-bundle
adb-atomic-developer-bundle copied to clipboard
SELinux vs Kubernetes volumes
SELinux inside ADB is blocking using emptyDir and hostPath volumes and passing secrets to containers as volumes
emptyDir and hostPath volumes
If you create pod with volume, where volume is either emptyDir or hostPath, SELinux is blocking access to that directory inside container.
example
busybox.yaml:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
volumes:
- name: storage
hostPath:
path: /tmp/storage
kubectl create -f busybox.yaml
kubectl exec -it busybox /bin/sh
/ # touch /storage/asdf
touch: /storage/asdf: Permission denied
I have to manually change context of /tmp/storage on host to get that working
chcon -Rt svirt_sandbox_file_t /tmp/storage
Same issue is with emptyDir volumes. This situation is a little bit complicated because emptyDir volumes are created in path containing pod uid /var/lib/kubelet/pods/4dd7b77b-8228-11e5-b8db-525400e09276/volumes/kubernetes.io~empty-dir/
Kubernetes secrets
SELinux is also blocking access to secret volumes so you can not pass secrets from kubernetes to containers.
example:
secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
password: dmFsdWUtMg0K
username: dmFsdWUtMQ0K
busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
- name: secret
mountPath: /secret
volumes:
- name: storage
hostPath:
path: /tmp/storage
- name: secret
secret:
secretName: mysecret
kubectl create -f secret.yaml
kubectl create -f busybox.yaml
now you should be able to access /secret/username
and /secret/password
inside container
# kubectl exec -it busybox /bin/sh
# ls /secret/
ls: /secret/username: Permission denied
ls: /secret/password: Permission denied
to get this manually working is a little bit pain:
# first you need to get pod id
PODID=$(kubectl get pod busybox -o template --template="{{ .metadata.uid }}")
# thank you can change security context of volume on host
chcon -Rt svirt_sandbox_file_t /var/lib/kubelet/pods/$PODID/volumes/kubernetes.io~secret
# kubectl exec -it busybox /bin/sh
# cat /secret/*
value-2
value-1
It is quite pain to work with emptyDir and secrets as volumes inside ADB, you have to always think of chcon
. When you stop pod it is recreated with new uid, so you have to chcon
secrets and emptyDir volumes again :-(
I don't know if this is Kubernetes problem or something with SELinux setup inside ADB.
I think that in hostDir case it is ok to require manual chcon
, but with emptyDir and with secrets, that should be automatic.
My idea is to automatically label everything in /var/lib/kubelet/pods/*/volumes/
svirt_sandbox_file_t
.
My idea is to automatically label everything in /var/lib/kubelet/pods/*/volumes/ svirt_sandbox_file_t.
I think this can be handled by k8s SELinux rules (may be have those entry to spec file) but it's should be discussed with @rhatdan to understand if there any drawback.
If you label everything in this directory with this label then (::svirt_sandbox_file_t:s0) then all containers will be able to read write all content there. If you use the "Z" when volume mounting into a container then docker will relabel the content to something that is private to the container.
Are there any progress on this issue. This issue blocks using kubernetes with Atomic Host in production.
Proposal is already available to kubernetes GitHub to make sure SELinux context (https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/selinux.md ) should be correct and it is used in OpenShift downstream work (https://github.com/kubernetes/kubernetes/pull/9844). PR say it's already merged. I will test latest branch and check if that something we want for ADB.
I tried latest kubernetes (update-testing repo) in fedora-23 and still empty directory have permission issue.
# cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
volumes:
- name: storage
emptyDir: {}
# kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 40s
# kubectl exec -it busybox /bin/sh
/ # cd /storage/
/storage # touch hello
touch: hello: Permission denied
But as per https://github.com/kubernetes/kubernetes/issues/15883 it should be resolved for fedora so I will discuss it with @pmorie
I had discussion with @pmorie and looks like labeling /var/lib/kubectl/pods
with ::svirt_sandbox_file_t:s0
will be good enough for this issue because latest version of k8s (https://github.com/kubernetes/kubernetes/commit/1d352a16b8e766eabe7ab75ebddc43f07a6fcdd0) have Z
added for emptydir which relabel it to different context as @rhatdan said.
Now we have to make sure if k8s shipped by ADB have this fix or not.
@praveenkumar ADB/CDK is based on CentOS/RHEL. So it needs to be fixed in RHEL and ADB/CDK rebuild will get it.
We should file a bug against RHEL and then can put a workaround in the kickstart file.
@LalatenduMohanty yes I filled issue against RHEL https://bugzilla.redhat.com/show_bug.cgi?id=1298568
Some more observation after checking if stock k8s has auto labeling of shared volumes but looks like it still lack of it.
Plain docker container
# docker run -d -v /root/hello/:/strorage:rw,Z --name test docker.io/busybox /bin/sleep 36000
980411a0c1ee9c4689b4ecde78810c0bc712e14d88e04f205aa57a2fdb77e830
# ls -Zl .
total 8
-rw-------. 1 system_u:object_r:admin_home_t:s0 root root 4283 Apr 5 02:13 anaconda-ks.cfg
drwxr-xr-x. 2 system_u:object_r:svirt_sandbox_file_t:s0:c499,c672 root root 6 Apr 6 02:30 hello
# docker inspect test | grep -i mount
"MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c499,c672",
"Mounts": [
{
"Source": "/root/hello",
"Destination": "/strorage",
"Mode": "rw,Z",
"RW": true
}
k8s container
Using same template as I pointed above (used by @kadel also)
# kubectl create -f busybox.yaml
# kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 1 18h
# docker ps
7ccc35980a91 busybox "sleep 36000" 8 hours ago Up 8 hours k8s_busybox.ed662816_busybox_default_76979d50-fb2b-11e5-8b91-52540099d993_a9d80ec0
d716252fea3e registry.access.redhat.com/rhel7/pod-infrastructure:latest "/pod" 18 hours ago Up 18 hours k8s_POD.ae8ee9ac_busybox_default_76979d50-fb2b-11e5-8b91-52540099d993_c2e36ba7
# docker inspect test | grep -i mount
"MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c499,c672",
"Mounts": [
{
"Source": "/var/lib/kubelet/pods/76979d50-fb2b-11e5-8b91-52540099d993/volumes/kubernetes.io~empty-dir/storage",
"Destination": "/storage",
"Mode": "",
"RW": true
},
# ls -Zl /var/lib/kubelet/pods/76979d50-fb2b-11e5-8b91-52540099d993/volumes/kubernetes.io~empty-dir
total 0
drwxrwxrwx. 2 system_u:object_r:svirt_sandbox_file_t:s0 root root 27 Apr 5 12:11 storage
Conclusion
So k8s does relabel it but only issue I am worried about Mount mode.
@praveenkumar : Can you retry with following?
# cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
volumes:
- name: storage
emptyDir: {}
SELinuxRelabel: true
@navidshaikh kubectl
not able to parse SELinuxRelabel flag from this manifest file
# kubectl create -f busybox.yaml
error validating "busybox.yaml": error validating data: found invalid field SELinuxRelabel for v1.Volume; if you choose to ignore these errors, turn validation off with --validate=false
The Relabel is only going to happen the first time, not on each container start, so not having it in the inspect might be alright. Not sure how k8s does the relabel though.
@rhatdan please let me understand then if we do below changes in our kickstart file which generate vagrant box to consume those service then would it be any issue ?
# chcon -Rt svirt_sandbox_file_t /var/lib/kubelet/pods
and then once boxes are build we can have access to mounted volumes
# kubectl create -f busybox.yaml
If you do that labeling, that means that all pods/containers can read/write to all other pods content.
All container processes can read/write content with the svirt_sandbox_file_t type, if they have an MCS Label of S0. In order to get them isolated you want them to have different MCS labels like s0:c1,c2 versus s0:c3,c4. The docker daemon and RKT do this automatically. If you are volume mounting into the container then from a docker point of view, mounting using Z will cause docker to relabel the content to a private MCS type at container creation. If you stop and start the container without recreating it, the labeling will not happen again.
@praveenkumar
So k8s does relabel it but only issue I am worried about Mount mode.
I'm not exactly sure what the question is here.
All container processes can read/write content with the svirt_sandbox_file_t type, if they have an MCS Label of S0. In order to get them isolated you want them to have different MCS labels like s0:c1,c2 versus s0:c3,c4. The docker daemon and RKT do this automatically.
@rhatdan Yes agreed, I got the point that container process have r/w because of MCS label of s0
and I did experiment with docker by adding -Z
to mount point did change MCS labels automatic. Now RKT might also doing it automatic as you said but our usecase is kubernetes and it doesn't have any selinux relabel parameter
in pod manifest + not doing it automatic also (As per my experiment). Now based on what you said I suppose it is not a good practice to set SELinux-context directory wide since auto-labeling not happening here.
I'm not exactly sure what the question is here.
@pmorie So I was wondering if there is something in pod manifest to define mode
as parameter and then put value Z
to that parameter but didn't get any such param.
K8s should probably have a flag like private and shared in the volume label that could then be translated into :z and :Z within docker command.
@rhatdan
K8s should probably have a flag like private and shared in the volume label that could then be translated into :z and :Z within docker command.
There's a community PR that adds mount propagation support now; I expect we'll have it in 1.3
@praveenkumar
@pmorie So I was wondering if there is something in pod manifest to define mode as parameter and then put value Z to that parameter but didn't get any such param.
Kubernetes does support relabeling emptyDir volumes, but you need to explicitly define the SELinux context in the pod spec. There are two places do it, a pod-level securityContext
field in the pod spec, or a securityContext
field in a container. In your case, I would recommend using the pod's security context, like so:
apiVersion: v1
kind: Pod
metadata:
name: hello-world
spec:
containers:
- name: hello-world-container
# The container definition
# ...
securityContext:
privileged: true
seLinuxOptions:
level: "s0:c123,c456"
It's possible to specify the user, role, and type in this field, but you really only need to specify the level. Note, OpenShift automatically allocates a level for your pod and sets this piece of the pod spec for you.
See the docs for more information: http://kubernetes.io/docs/user-guide/security-context/
@navidshaikh @praveenkumar
for the record, SELinuxRelabel
is not a real thing in the Kubernetes API:
volumes:
- name: storage
emptyDir: {}
SELinuxRelabel: true
@pmorie So I got some time today to test this out and getting Error from server: error when creating "busybox.yaml": pods "busybox" is forbidden: SecurityContext.SELinuxOptions is forbidden
any idea what else we should configure to make it work?
$ cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
securityContext:
privileged: true
seLinuxOptions:
level: "s0:c123,c456"
volumes:
- name: storage
emptyDir: {}
@praveenkumar Try removing SecurityContextDeny
from KUBE_ADMISSION_CONTROL=
in /etc/kubernetes/apiserver
I've been poking at this in the kubernetes/contrib ansible scripts context for a little bit. I wrote up some notes on what I've found here.
@praveenkumar Try removing SecurityContextDeny from KUBE_ADMISSION_CONTROL= in /etc/kubernetes/apiserver
@jasonbrooks If I remove "SecurityContextDeny" then it will disable the SELinux context for pods which something I don't want.
I've been poking at this in the kubernetes/contrib ansible scripts context for a little bit. I wrote up some notes on what I've found here.
That's helpful I also enabled sudo chcon -Rt svirt_sandbox_file_t /var/lib/kubelet
for time being but then relabeling not happening right now.
@praveenkumar Do you mean that removing SecurityContextDeny would allow a user to disable the SELinux context? I'm looking at the ADB now, with the setting changed, and the processes from two different pods are still being labeled differently. Maybe I'm looking at the wrong thing. How are you confirming whether selinux context is disabled? @pmorie
[vagrant@centos7-adb ~]$ ps axZ | grep nginx
system_u:system_r:svirt_lxc_net_t:s0:c376,c897 13763 ? Ss 0:00 nginx: master process nginx -g daemon off;
system_u:system_r:svirt_lxc_net_t:s0:c376,c897 13768 ? S 0:00 nginx: worker process
system_u:system_r:svirt_lxc_net_t:s0:c380,c695 14066 ? Ss 0:00 nginx: master process nginx -g daemon off;
system_u:system_r:svirt_lxc_net_t:s0:c380,c695 14074 ? S 0:00 nginx: worker process
@praveenkumar, @jasonbrooks is right -- SecurityContextDeny
is an admission controller that rejects pods that try to use security context. So, remove that from your list.
Btw, apologies for missing the notification on this one, @praveenkumar
@pmorie So as per our yesterday IRC conversation I made modification as you said and below is result.
Step-1: Volume dir has selinux type svirt_sandbox_file_t
# ls -Zl /var/lib/ | grep kubelet
drwxr-xr-x. 4 system_u:object_r:svirt_sandbox_file_t:s0 root root 31 Apr 22 00:40 kubelet
Step-2: Cluster does not use the 'SecurityContextDeny' admission controller
cat /etc/kubernetes/apiserver | grep KUBE_ADMISSION_CONTROL
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota"
Step-3: Explicitly set the parts of the selinux context that you want to be set, in the pod's security context
$ cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
volumes:
- name: storage
emptyDir: {}
# docker inspect k8s_busybox.cd717296_busybox_default_05309069-0bca-11e6-9fd9-525400ed468d_3061e077 | grep -i mount
"MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c123,c456",
# ls -Zl /var/lib/kubelet/pods/05309069-0bca-11e6-9fd9-525400ed468d/volumes/
total 0
drwxr-xr-x. 3 system_u:object_r:svirt_sandbox_file_t:s0 root root 20 Apr 26 12:14 kubernetes.io~empty-dir
drwxr-xr-x. 3 system_u:object_r:svirt_sandbox_file_t:s0 root root 32 Apr 26 12:14 kubernetes.io~secret
I can still see it is not relabeled, is that expected?
So yesterday I again had conversation with @pmorie and we able to create a workaround steps for it.
Step-1: Volume dir has selinux type svirt_sandbox_file_t
# ls -Zl /var/lib/ | grep kubelet
drwxr-xr-x. 4 system_u:object_r:svirt_sandbox_file_t:s0 root root 31 Apr 22 00:40 kubelet
Step-2: Cluster does not need to use the 'SecurityContextDeny' admission controller
cat /etc/kubernetes/apiserver | grep KUBE_ADMISSION_CONTROL
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota"
Step-3: Explicitly set the parts of the selinux context that you want to be set, in the pod's security context
$ cat busybox.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
name: busybox
command:
- sleep
- "36000"
volumeMounts:
- name: storage
mountPath: /storage
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
volumes:
- name: storage
emptyDir: {}
# docker inspect k8s_busybox.cd717296_busybox_default_05309069-0bca-11e6-9fd9-525400ed468d_3061e077 | grep -i mount
"MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c123,c456",
# ls -Zl /var/lib/kubelet/pods/4f5704d4-0c8b-11e6-9056-5254006f8934/volumes/kubernetes.io~empty-dir/
total 0
drwxrwxrwx. 2 system_u:object_r:svirt_sandbox_file_t:s0:c123,c456 root root 6 Apr 27 11:18 storage
We need to put those in documents so user can refer.
@LalatenduMohanty, @kadel Since we have a workaround for this issue and it should be documented.
@praveenkumar I want to fix it before June 24th 2016 (most likely in vagrant-service-manager). Assigning it to myself.
@LalatenduMohanty this is mostly document side their is very small change required from kickstart file side and it should be done before 7June so can be part of 2.1 release.