aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Allow more control of the name of the directory created by the Access Point under Dynamic Provisioning

Open jonathanrainer opened this issue 3 years ago • 13 comments

This is a continuation of #568, as that PR was opened incorrectly and messing around with it was much more trouble than just opening a new PR

Is this a bug fix or adding new feature? This adds a new feature, mostly in response to discussions on Issue https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/543

What is this PR about? / Why do we need it? At the moment if you use dynamic provisioning the name of the folder created on the EFS is the name of the generated PV however in some use cases this isn't desirable and leads to a lot of confusion. Under this scheme the name of the directory can be controlled by users using a new parameter subPathPattern. This allows users a higher level of control over the directories that are created as they can specify any combination of fixed strings and a set of predefined variables based on what the CSI Spec supports. To stop this causing problems with security, there is additionally a new featureFlag added called ensureUniqueDirectories which appends a UUID to the created structure so clashes will be avoided. This is true by default but allows users to explicitly opt into an unsafe mode if they wish to provide the guarantee that there will be not be clashes themselves rather than having the driver enforce it for them.

What testing is done? I have added to the unit tests for the module and I believe this is sufficient because all that is being changed is the directory being added to the AccessPointOpts struct. There might want to be some E2E testing about what happens if that directory already exists but I'm happy to take pointers on that as this is my first PR to the repo.

fixes #543

jonathanrainer avatar Feb 23 '22 06:02 jonathanrainer

Hi @jonathanrainer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 23 '22 06:02 k8s-ci-robot

Right, sorry about the PR wrangling, but now we're back I can continue the conversation from #568.

@iAnomaly, @wolffberg - This is where the PR continues. @wolffberg I've addressed your points about validating the created EFS path by explicitly validating it before we send it off the API and have added tests to that effect.

@wongma7 - I had a long think about your comments re: security and think I've come up with a compromise. My general view is always that things should be secure by default but should allow you to turn off those controls in the knowledge that you're explicitly opting in to having to manage those guarantees yourself. So to that end I've introduced another parameter ensureUniqueDirectories which adds a UUID to the end of any directory path specified to ensure that directories are unique. This is on by default so I imagine most users won't notice. You can turn this off to get the behaviour originally specified but throughout the README's and examples it makes it clear that then you are opting in to the responsibility to manage the guarantees yourself. This way it's not a surprise for users and it would require you to make a bad change in two places for something bad to happen.

I hope this is satisfactory and if not I'm more than happy to have a discussion about it. Further I realise some of the Go may not be 100% correct as I'm fairly new to the language and some of the most common idioms still elude me so feel free to comment away but hopefully this is a much better proposition than that proposed previously.

jonathanrainer avatar Feb 23 '22 06:02 jonathanrainer

This is awesome @jonathanrainer, really appreciate all your time on this and all the cases you covered with tests. I went looking for the omitted basePath, empty subPathPattern case (which achieves the AWS EFS Console default behavior of just / as the root path for the Access Point) and sure enough you've got that in there.

Thanks again, this will be an awesome flexible feature once merged.

iAnomaly avatar Apr 11 '22 20:04 iAnomaly

/assign wongma7

jonathanrainer avatar Apr 24 '22 09:04 jonathanrainer

/ok-to-test

I will not have time to review for a while but in principle yes I want this merged, thanks for patience

wongma7 avatar Apr 27 '22 19:04 wongma7

I've run into a use case where these changes would be highly desirable. Thanks team!

thomaspaulin avatar Jun 01 '22 11:06 thomaspaulin

Just to prove this actually works, I spun this up on an EKS cluster with the following StorageClass

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-06-12T15:46:35Z"
  labels:
    provisioning-type: dynamic
  name: efs-dynamic
  resourceVersion: "16864"
  uid: 983866ac-c731-45ac-a22d-1d1edaf7dbda
parameters:
  basePath: ""
  directoryPerms: "777"
  ensureUniqueDirectory: "false"
  fileSystemId: fs-05427a7b4af6c1fcc
  gidRangeEnd: "2000"
  gidRangeStart: "1000"
  provisioningMode: efs-ap
  subPathPattern: /dynamic/${.PVC.name}/foo
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Which then generated the following on disk: Screenshot 2022-06-12 at 16 53 07

So my Unit Test predictions were good and it also showed the value of all the error handling as I got the StorageClass wrong a few times and every time my error handling directed me to the issue quickly.

jonathanrainer avatar Jun 12 '22 15:06 jonathanrainer

Hey @jonathanrainer

Here at YData we are testing your solution, and fix a big issue for our problem, so thank you and hope this PR get merged soon in order to use the official version.

Our storage class looks like this:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: sc-new
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxx
  directoryPerms: "0770"
  gid: 1000
  uid: 1000
  subPathPattern: "/efs"
  ensureUniqueDirectory: "false"
reclaimPolicy: Retain
volumeBindingMode: Immediate

About the issue: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/715

We are testing, and we notice that new access points are created each time. For example, here where we have multiple PVC's and each PVC using the same directory.

image

About this issue: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/516

We found out, and maybe is worthy of put in the docs, is that with SubPath we can't write to the root path:

SubPaths in the pods will not work with:

  subPathPattern: "/"
  ensureUniqueDirectory: "false"

But after setting to subPathPattern: "/efs"or any other path everything works great.

andre-lx avatar Jun 28 '22 10:06 andre-lx

Hi @andre-lx, thanks for the very kind words,

With regards your first point, I'd never really considered that access points could be re-used, but you make a very fair point. I'll need to be a bit careful about that intersects with other features, for example if you use the GID range features then you would need a new access point each time because each one would be for a different GID but I don't think it's beyond the realms of possibility to check that is the case, it's also probably a case where adding an E2E test would be a good idea so thanks for spotting that.

In regards the second point that seems like a bug/gap in my unit testing. I'm going to have a good look at that and hopefully they'll be a fix fairly soon. I'm quite busy this week but hopefully will get a chance to look over this at the weekend and will post the results of my findings when I have them.

Thanks so much for taking an interest, and I look forward to getting to the bottom of these issues.

jonathanrainer avatar Jun 28 '22 12:06 jonathanrainer

Hi @andre-lx,

I've looked into both the issues you've raised and I'll respond to them in turn:

  1. With the issue around setting
subPathPattern: "/"
ensureUniqueDirectory: "false"

I've run some tests and it appears the code is operating exactly as I'd expect, i.e. the request to AWS requests an accessPoint that's root directory is "/". What error are you seeing when this happens? And further what use case does this service? Setting the subPath pattern like that would simply allow you to gain access to the root of the directory, which I guess is useful, but if that's what you want then why not just create a PV/PVC statically and mount that PVC into multiple pods? It seems simpler and like it would accomplish what you want without needing this feature. This might also be helped by the work done in #732.

  1. With the issue around re-use of access points, I've had a look into this and I'm not sure it's a good idea, for one thing it introduces the need for concurrency controls and entails quite a lot of book-keeping on the drivers side to make sure the set of access points is correct with respect to the cluster. You have to guarantee that you can lock access to the AWS API so that two concurrent requests from a PVC end up sharing one and also you'd have to ensure that when one PVC got deleted the access point didn't disappear from the other PVCs that were using it.

However I think that the recent PR I've raised in #732 might help you out with the problem you're having, and that one combined with this one will give a similar kind of functionality without depleting the access points available. If this is something that would still be useful then it's probably something a future PR could tackle but I think it's scope creep for this current PR.

jonathanrainer avatar Jul 01 '22 14:07 jonathanrainer

Hi @jonathanrainer, since you plan to work and finalize #732, do you plan to finalize this PR as well? That would be a very nice evolution

ChamMach avatar Sep 30 '22 15:09 ChamMach

@ChamMach Yer that's the plan, I want to get both merged but obviously need reviews and stuff

jonathanrainer avatar Sep 30 '22 15:09 jonathanrainer

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iAnomaly, jonathanrainer Once this PR has been reviewed and has the lgtm label, please ask for approval from wongma7 by writing /assign @wongma7 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Sep 30 '22 20:09 k8s-ci-robot

Hey! This is highly desirable by my side... Any time frame on it going merged? Or anyway of using this feature as a tester @jonathanrainer ? @andre-lx maybe you can describe what you did to test the solution? (i.e. build Dockerfile etc.) That would really help me and my case 😄

EDIT: Managed to get it to work (sorry for my noobishness). My use case was that a app from vendor required a storageClass but the problem was that every iteration of the app or version bump a new namespace was created, therefore creating a new EFS AP without the needed files on the previous AP path. This ensured that there was no data lost across every iteration or namespace.

I'm still in the process of finishing the implementation but I'm fairly confident that it should work. Thank you for this! 🥂

gfafonso avatar Oct 19 '22 17:10 gfafonso

Hello, any news on merging this? This will save a lot of trouble. Currently sharing an EFS access point for shared services let's say in a cluster, leads us to create a PV and a PVC in each of the 30 namespaces (one per microservice) that reside in a cluster.

If this could get merged, we can use a shared services storage class with dynamic provisioning instead and a fixed subpath so we could create all those dynamically using only the PVC's and reducing the clutter.

tgeo-cambrian avatar Nov 08 '22 09:11 tgeo-cambrian

Any news on this one? This will also save us loads of work

luismy avatar Nov 11 '22 12:11 luismy

@Ashley-wenyizha Sorry I've been so long to sort this out, am really eager to get this merged soon if we can and happy to do whatever's necessary to make that happen :)

jonathanrainer avatar Feb 04 '23 06:02 jonathanrainer

Exciting times! So looking forward to this!

luismy avatar Feb 08 '23 10:02 luismy

Hello, any news on it being merged? This piece would make our life so much easier!

BornaMaticic avatar Mar 20 '23 13:03 BornaMaticic

Are there any updates? I'm waiting to be merged.

jz0912 avatar Mar 30 '23 05:03 jz0912

Hi @Ashley-wenyizha, I know you said https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/732#issuecomment-1536595665 that this issue would be progressing as well, has there been any further progress? More than happy to help in any way I can :)

jonathanrainer avatar May 29 '23 06:05 jonathanrainer

Thought it would be useful just to point out this PR resolves a Disaster Recovery issue that can arise when using dynamic provisioning.

Consider - a cluster running disparate workloads, with dynamic PVC's defined in mutiple repo's. Now imagaine the cluster completely dies and requires rebuilding all the deployments from manifests. The present dynamic provisioning model would be very awkward to reconnect the EFS state directories back up to the correct deployment. You'd need to modify all the PVC's (in all their different repos) and essentially convert them to static provisioning to define the path where the files still exist. In which case you may as well just start with static provisioning model....

Thanks to the people who have put considerable effort into this PR - hope it gets merged soon.

SpoddyCoder avatar Jul 07 '23 14:07 SpoddyCoder

Hi Jonathan @jonathanrainer,

Thanks for the patience and apologize about the delay, this PR lgtm, could you give it a rebase and squash into 1-3 commits? We can merge this afterwards and you can continue to revise PR 732

thanks!

Ashley-wenyizha avatar Jul 21 '23 21:07 Ashley-wenyizha

bump!

harshad-jpmc avatar Aug 09 '23 17:08 harshad-jpmc

Hi Jonathan @jonathanrainer,

this PR lgtm, could you give it a pull + rebase and squash into 1-3 commits? We can merge this afterwards and you can continue to revise PR 732

thanks!

Ashley-wenyizha avatar Aug 11 '23 20:08 Ashley-wenyizha

@Ashley-wenyizha Hi, sorry for the delay, real life stuff got a bit on top of me! But have done what you wanted now and this is ready to merge as and when you'd like :)

jonathanrainer avatar Aug 12 '23 16:08 jonathanrainer

@Ashley-wenyizha FYI I had to make a small change to the sanity tests because the EFS API Request won't accept paths more than 100 characters but the sanity attempts to use a volume name of 128 characters. In this case I think it's fine because this is the mechanism by which you'd get around exactly this problem but if you think there's a better solution then let me know

jonathanrainer avatar Aug 12 '23 17:08 jonathanrainer

/lgtm /approve

Ashley-wenyizha avatar Aug 15 '23 02:08 Ashley-wenyizha

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ashley-wenyizha, iAnomaly, jonathanrainer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [Ashley-wenyizha]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 15 '23 02:08 k8s-ci-robot

This is fantastic. Any idea what version we can expect this feature to be available in?

Thanks for all the work that went into this.

joshuaganger avatar Aug 23 '23 14:08 joshuaganger

@jonathanrainer It looks like the new properties (subPathPattern, ensureUniqueDirectory) were released with helm chart 2.4.9, but not in the driver appVersion 1.6.0 that this helm chart uses.

When will these properties be supported in a released appVersion?

Also, note the typo in the helm docs: s/ensureUniqueDirectories/ensureUniqueDirectory/

joebowbeer avatar Sep 18 '23 16:09 joebowbeer