aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Dynamic provisioning seems to ignore subPathPattern

Open laconictae opened this issue 2 years ago • 6 comments

/kind bug

I've implemented the dynamic access point provisioning approach as outline here: https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/dynamic_provisioning/README.md, however, I can't seem to get the subPathPattern parameter to have any effect. The access points are created successfully when I create a new PVC, however they are at a path in EFS of just the PVC's volume name:

image

My storage class is configured as follows:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs
parameters:
  directoryPerms: "700"
  ensureUniqueDirectory: "False"
  fileSystemId: <efs_filesystem_id>
  provisioningMode: efs-ap
  subPathPattern: ${.PVC.namespace}/${.PVC.name}
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

A sample PVC I created is:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app
  namespace: baseline
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: efs

With this I would expect the path in EFS to be /baseline/app but instead it is /pvc-51402212-7b56-4841-bab5-9b34bdb31b4f (the dynamically provisioned volume name).

laconictae avatar Nov 01 '23 18:11 laconictae

Hi @laconictae, are you using the v1.7.0? Because this new feature was released in v1.7.0.

mskanth972 avatar Nov 02 '23 18:11 mskanth972

Hello @mskanth972 and thanks for the response! As it happens I was on v1.6.0 and upgraded to v1.7.0 late yesterday afternoon which did (after a lot of trial and error with ensureUniqueDirectory and reuseAccessPoint) get me where I wanted with the path in EFS. However, I now have a new problem which I am wondering if there is any workaround for or if I'm just not really understanding the paradigm here...

I have paths in EFS like the following:

/namespace_a/application_a
/namespace_a/application_b
/namespace_b/application_a

The issue that I'm now running into is the access points get created successfully, and for a POSIX user/group ID from the pool specified by my parameters on the storage class (gidRangeStart, gidRangeEnd). All well and good. However, I find that the mounted volume on my pods is mounted with completely wrong permissions. For instance, Application A in Namespace A might have a dynamically provisioned access point with POSIX user 6999999. My pod for Application A in Namespace B although it clearly references a different dynamically provisioned access point with POSIX user 6999998, in the pod you can see the volume mounted as owned by 6999999.

I was completely confused about this and I am not entirely sure it's not a bug in EFS. I manually mounted the access points on a node to try to explore the permissions, and although I have 3 separate access points with different POSIX users (6999998, 6999999, 7000000), they all mounted owned by user 6999999. This results in my pods not having permission to the mounted volume, which is obviously not ideal.

Interestingly, at one point with the provisioner crashing due to some idiocy on my part, I had a queue of PVCs trying to be created which all got provisioned simultaneously when I corrected the provisioner configuration. It created 3 access points in EFS all with the same POSIX user despite the three separate paths. In that particular case, all the pods were happy, all had access to their mounted volume. However, I can't control the POSIX user which is used for the access points, I can only provide a range.

I am still digging at this point and trying to understand how this stuff even works, I will admit that this is area is really not my forte so I am bumbling around a lot, but I am starting to wonder if what I'm trying to do is an anti-pattern and I ought to just admit defeat and use the static access point approach instead.

laconictae avatar Nov 02 '23 19:11 laconictae

Hi @laconictae I am not sure I understand the issue you are experiencing. Is the issue that you would like to mount a volume with a particular UID but you aren't getting the UID you expect?

Could you please provide the storage class you are using as well as a list of steps to recreate the issue?

seanzatzdev-amazon avatar Nov 06 '23 15:11 seanzatzdev-amazon

Hello @seanzatzdev-amazon - I don't really mind which UID is used, but I was seeing pods mounting EFS volumes which exec-ing into the pod revealed were mounted with permissions for a different POSIX user than the access point specified, resulting in me getting permission denied errors just trying to ls the mounted volume inside the pod.

I feel with the various things I was trying with dynamic provisioning, I might have put my EFS filesystem in a bad state. I created a new EFS filesystem and re-provisioned access points there, that has been working (seemingly) fine so far. I do see some weirdness when multiple access points get provisioned simultaneously, they end up with the same POSIX user permissions - but that doesn't break anything for me.

What I have set up now for the storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-dynamic
parameters:
  directoryPerms: "750"
  ensureUniqueDirectory: "False"
  fileSystemId: <file_system_id>
  gidRangeEnd: "7000000"
  gidRangeStart: "6999001"
  provisioningMode: efs-ap
  reuseAccessPoint: "False"
  subPathPattern: ${.PVC.namespace}/${.PVC.name}
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

This is to give me a predictable path within EFS - if I do not disable the unique directory setting, a GUID for the PVC is appended to the path and I don't seemingly have a way to get at that again if I re-deploy the application (the access point is removed, then a new access point is created with a new GUID). I tried playing around with reuseAccessPoint as true, but I found that the same access point was being used for two different PVCs in different namespaces and the provisioner just kept updating the access point based on which was deployed most recently.

So this seems to work, but what I'm worried about is getting into a mess again where due to removing and re-deploying applications, the provisioner ends up creating an access point with a previously used path, but with a different POSIX user than was originally used. I'm worried that will result in permission errors again. Maybe my storage class reclaimPolicy should be Retain?

laconictae avatar Nov 06 '23 16:11 laconictae

Hi @laconictae Just to clarify, are you reporting an issue in the CSI Driver code? We use this GitHub Issues page to track things like community-reported bugs, and thus it may not be the best place to get support on this problem.

ensureUniqueDirectory is a storage class feature meant to prevent conflicting access point directory paths, so you may want to consider re-enabling this feature.

seanzatzdev-amazon avatar Nov 10 '23 21:11 seanzatzdev-amazon

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 26 '24 13:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 27 '24 14:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Apr 26 '24 15:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 26 '24 15:04 k8s-ci-robot