aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard
securityContext doesn't applied when creating pvc
/kind bug
What happened?
Deploy Hashicorp vault helm chart that include this securityContext
below without the specified uid/gid ( the /vault/data folder created by root ):
securityContext:
runAsNonRoot: true
runAsGroup: {{ .Values.server.gid | default 1000 }}
runAsUser: {{ .Values.server.uid | default 100 }}
fsGroup: {{ .Values.server.gid | default 1000 }}
What you expected to happen?
mount efs persistent volume with the right securityContext
attributes
How to reproduce it (as minimally and precisely as possible)?
deploy vault helm chart with declaring the right storageClass that represents efs.
Environment
-
Kubernetes version: v1.15.3
-
Driver version: 0.2.0
Thx for reporting the issue. What uid/gid of the container do you see when "securityContext doesn't applied when creating pvc"? And I'm assuming you are setting the security container to the csi driver? Could you clarify with more info like the pod spec and logs
Thanks for response @leakingtapan , the uid/gid of the container is:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 100
pod spec:
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 10.42.133.210/32
creationTimestamp: "2020-02-10T14:09:40Z"
generateName: vault-
labels:
app.kubernetes.io/instance: vault
app.kubernetes.io/name: vault
component: server
controller-revision-hash: vault-7f64cccd79
helm.sh/chart: vault-0.3.3
statefulset.kubernetes.io/pod-name: vault-0
name: vault-0
namespace: vault
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: vault
uid: 77540e58-ec01-4129-a98a-4f0f05b45da3
resourceVersion: "3692437"
selfLink: /api/v1/namespaces/vault/pods/vault-0
uid: 7325dc66-77fc-47d7-bafb-d1e061a2da35
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: vault
app.kubernetes.io/name: vault
component: server
topologyKey: kubernetes.io/hostname
containers:
- args:
- "sed -E \"s/HOST_IP/${HOST_IP?}/g\" /vault/config/extraconfig-from-values.hcl
> /tmp/storageconfig.hcl;\nsed -Ei \"s/POD_IP/${POD_IP?}/g\" /tmp/storageconfig.hcl;\n/usr/local/bin/docker-entrypoint.sh
vault server -config=/tmp/storageconfig.hcl \n"
command:
- /bin/sh
- -ec
env:
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: VAULT_ADDR
value: http://127.0.0.1:8200
- name: VAULT_API_ADDR
value: http://$(POD_IP):8200
- name: SKIP_CHOWN
value: "true"
- name: SKIP_SETCAP
value: "true"
image: vault:1.3.1
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- kill -SIGTERM $(pidof vault)
name: vault
ports:
- containerPort: 8200
name: http
protocol: TCP
- containerPort: 8201
name: internal
protocol: TCP
- containerPort: 8202
name: replication
protocol: TCP
readinessProbe:
exec:
command:
- /bin/sh
- -ec
- vault status -tls-skip-verify
failureThreshold: 2
initialDelaySeconds: 5
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 300m
memory: 1Gi
requests:
cpu: 200m
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /vault/data
name: data
- mountPath: /vault/config
name: config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: vault-token-p2vsj
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: vault-0
nodeName: worker-2
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 100
serviceAccount: vault
serviceAccountName: vault
subdomain: vault
terminationGracePeriodSeconds: 10
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: data
persistentVolumeClaim:
claimName: data-vault-0
- configMap:
defaultMode: 420
name: vault-config
name: config
- name: vault-token-p2vsj
secret:
defaultMode: 420
secretName: vault-token-p2vsj
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-02-10T14:09:42Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-02-10T14:09:42Z"
message: 'containers with unready status: [vault]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-02-10T14:09:42Z"
message: 'containers with unready status: [vault]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-02-10T14:09:42Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://af5fd96eb6c829065094eb33c34661f5ed7f0fbb2a6b8a791c0dbd05429cfd31
image: vault:1.3.1
imageID: docker-pullable://vault@sha256:2f797433dfc322d7ba6fa81074e010873c740eb45e83ff5ced49cba585f82a66
lastState: {}
name: vault
ready: false
restartCount: 0
state:
running:
startedAt: "2020-02-10T14:09:45Z"
hostIP: 172.31.39.158
phase: Running
podIP: 10.42.133.210
qosClass: Burstable
startTime: "2020-02-10T14:09:42Z"
pod logs:
2020-02-10T14:09:45.837Z [WARN] storage migration check error: error="open /vault/data/core/_migration: permission denied"
WARNING! Unable to read storage migration status.
/vault folder
total 24K
drwxr-xr-x 1 vault vault 4.0K Feb 10 14:09 .
drwxr-xr-x 1 root root 4.0K Feb 10 14:09 ..
drwxrwsrwx 3 root vault 4.0K Feb 10 14:09 config
drwxr-x--- 2 root root 4.0K Feb 10 14:09 data
drwxr-xr-x 2 vault vault 4.0K Dec 19 04:26 file
drwxr-xr-x 2 vault vault 4.0K Dec 19 04:26 logs
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/add-lifecycle frozen
@leakingtapan is this fixed or going to be fixed soon ? can we freeze its lifecycle ?
/lifecycle frozen
Any news on this?
I'm also facing some issues with this when deploying netbox using the helm chart bootc/netbox
. I am setting fsGroup: 1000
, however these changes do not take effect on the pod itself.
I am unable to make an initContainer with root privileges since I am using AWS Fargate, so unfortunately my application will lack functionality.
I'm not so experienced with this sort of stuff, but if someone is able to offer some guidance on where to look then I'd be happy to try to propose a fix.
What is EFS CSI's official stance on this, i.e is it expected that it will chown
or not upon mount? The blogpost on the new CSIDriver FSGroupPolicy API in Kubernetes 1.20 mentions that NFS volume types may not implement this.
Traditionally if your pod is running as a non-root user (which you should), you must specify a fsGroup inside the pod’s security context so that the volume can be readable and writable by the Pod. ... But one side-effect of setting fsGroup is that, each time a volume is mounted, Kubernetes must recursively chown() and chmod() all the files and directories inside the volume - with a few exceptions noted below. ... Although the previous section implied that Kubernetes always recursively changes permissions of a volume if a Pod has a fsGroup, this is not strictly true. For certain multi-writer volume types, such as NFS or Gluster, the cluster doesn’t perform recursive permission changes even if the pod has a fsGroup. Other volume types may not even support chown()/chmod(), which rely on Unix-style permission control primitives.
So it seems like the EFS CSI driver is perfectly within spec to not chown
/chmod
upon mount? But if that's the expected behavior, it would be nice to document that (and even nicer to have the option to chown
/chmod
).
What is EFS CSI's official stance on this, i.e is it expected that it will chown or not upon mount?
I think https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/91 may be the issue for documenting this.
Sounds like EFS CSI spec needs to be updated to enforce the securityContext, probably due to areas evolving in parallel. Note issue #91 has been open since Sep 2019. A year and a half later, production deployments with EFS need this enforcement, rather than a workaround.
Or k8s will take care of it via https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
I'm having the same issue, although fsGroup
is specified, the EFS mounted directory doesn't have the correct permissions for the user to read/write.
Any updates on this?
Great. So this issue, combined with https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/300, means that I can neither set fsgroup on the pod's securityContext, neither can I chown the folder after the fact via an initContainer command.
This effectively leaves our mounted volumes unusable.
It looks like the ugly hack in (https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/300#issuecomment-984818028) is the only 100% working solution for now. Pity.
Facing the same issue. Volume is mounted to a folder with 755 permission owned by uid:0 / gid:0, not respecting fsGroup. So it's only writable by root. However, running container as root violates our security policy.
I think this is similar to vsphere-csi's issue. kubelet ignores fsGroup if fsType is empty.
so setting csi.storage.k8s.io/fstype: ext4
in storage class parameters should fix the problem but I can't test it myself.
EDIT: After reading more, I think we can make fsGroup
work without setting fsGroupPolicy
(which needs K8s 1.20+), by simply having our dynamic PersistentVolume
provisioner set a non-empty spec.csi.fsType
field, as @saeed617 said in https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/125#issuecomment-1140459270.
We should be able to fix this by setting fsGroupPolicy: File
in our CSIDriver
spec, which allows Kubernetes to change file permissions regardless of fstype
.
This is how the NFS CSI Driver did it in 2021, see PR: https://github.com/kubernetes-csi/csi-driver-nfs/pull/206
Here are our CSIDriver
files that need to change:
Any updates to this? Curious if this is an issue that I am having with Jenkins and their alpine images not being able to do a git checkout. Not being able to designate the UID and GID for the mount has put me up against a wall.
However strange, their centos7 images do not have issues with this, even though the permissions are set for an unknown user/group.
Access Points can be used to solve this issue.
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/access_points/README.md
Closing the issue as PR for enabling security context at container level is merged and will mark in the next release.
/close
@mskanth972: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Can someone confirm if this is fixed? I'm curious if setting a non-zero fsGroup
for a pod with a EFS mount works now, as I'm setting it for my EBS volume and am hoping it will be compatible with EFS as well.