aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Unable to deploy efs-csi-controller to Fargate to support Karpenter-provisioned EKS cluster

Open Nuru opened this issue 1 year ago • 16 comments

/kind bug

What happened?

  • I am using Terraform to manage AWS resources.
  • I tried to deploy, via Terraform, an EKS cluster with no nodes, but with the EFS CSI Add-On (and others). Nodes to be provisioned by Karpenter. The Karpenter controller itself is deployed to Fargate.
    • Karpenter provisions EC2 nodes on demand to run Kubernetes Pods.
    • I want the Pods (on EC2, provisioned by Karpenter) to have access to EFS.
    • Terraform fails to deploy the EKS cluster because the EFS Add-On never becomes ready (reports status as "Degraded"). I believe this is similar to EBS CSI ISSUE #1801: the controller pods need to be running for the Add-On to report being healthy, but they have no place to run.
  • I added a Fargate profile, targeting label app = "efs-csi-controller", so that the EFS controller would be launched to Fargate.
  • The Add-On still would not become healthy because the communication sockets were not created/available, and still reports status as "Degraded".
  • After Karpenter was deployed, it started nodes, and the efs-csi-node Daemonset successfully deployed to the EC2 nodes, but the efs-csi-controller Pods were still in a CrashLoopBackoff and the Add-On still reports status as "Degraded"..

What you expected to happen?

The controller pods would be deployed to Fargate and and work without the Node component, and the Add-On would report status as "Active". As EC2 Nodes were provisioned, controller Pods would work from Fargate while Node Pods worked properly on EC2 Nodes.

How to reproduce it (as minimally and precisely as possible)?

See "What happened" above.

Anything else we need to know?:

The failure that is reported to Kubernetes comes from the efs-plugin container exiting with an error. IMHO it should not try to run on Fargate, and probably should not be deployed as part of the controller for this reason.

Environment

  • Kubernetes version (use kubectl version): v1.27.4-eks-2d98532
  • Driver version: v1.5.8-eksbuild.1

Please also attach debug logs to help us better diagnose

Log excerpts (each one just keeps repeating the quoted excerpt):

efs-csi-controller csi-provisioner

W0816 04:26:59.779601       1 connection.go:183] Still connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock

efs-csi-controller liveness-probe

W0816 04:27:00.989300       1 connection.go:173] Still connecting to unix:///csi/csi.sock

efs-csi-controller efs-plugin

I0816 05:54:46.413768       1 config_dir.go:63] Mounted directories do not exist, creating directory at '/etc/amazon/efs'
I0816 05:54:46.418766       1 metadata.go:63] getting MetadataService...
I0816 05:54:52.757469       1 metadata.go:71] retrieving metadata from Kubernetes API
F0816 05:54:52.773395       1 driver.go:56] could not get metadata: did not find aws instance ID in node providerID string

Nuru avatar Aug 16 '23 06:08 Nuru

I also have the same issue. I would like to run the controllers on fargate, and have them attach EFS volumes to actual nodes that are then provisoned by karpenter.

apenney avatar Sep 22 '23 19:09 apenney

#1195 isn't sufficient for Fargate support. Latest eks addon v1.7.6-eksbuild.1 sets securityContext.privileged: true for controller pods. This isn't supported by fargate nodes.

Please reopen.

z0rc avatar Feb 26 '24 12:02 z0rc

/reopen

z0rc avatar Feb 26 '24 12:02 z0rc

@z0rc: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 26 '24 12:02 k8s-ci-robot

@Nuru could you reopen the ticket please?

z0rc avatar Feb 26 '24 12:02 z0rc

/reopen

It looks like the changes in #1195 were necessary, but not sufficient.

Nuru avatar Feb 27 '24 18:02 Nuru

@Nuru: Reopened this issue.

In response to this:

/reopen

It looks like the changes in #1195 were necessary, but not sufficient.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 27 '24 18:02 k8s-ci-robot

Just fall in the same situation, can't deploy the add-on because kube-system is a fargate namespace. Same context = Karpenter + FargateCluster Will switch on the manual installation mode, but that seem a waste of time. Allow controllers to run on fargate would be great, thanks

sogos avatar Apr 25 '24 16:04 sogos

We're facing the same issue as previous commenter

skraga avatar Apr 30 '24 10:04 skraga