aws-ebs-csi-driver Cannot Run with IAM Service Account and no metadata service

/kind bug

What happened? The ebs-plugin container on the ebs-csi-controller crashes repeatedly while talking to the metadata service:

I0325 19:25:18.560010       1 driver.go:62] Driver: ebs.csi.aws.com Version: v0.6.0-dirty
panic: EC2 instance metadata is not available

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:83 +0x196
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0xc00016ff70, 0x3, 0x3, 0xc0000a88a0, 0xdcc3a0, 0xc0001edb00)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:87 +0x512
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:31 +0x117

What you expected to happen?

When specifying the AWS_REGION variable, and an IAM role service account, the ebs-csi-driver should not need to access the metadata service, and run on its own.

How to reproduce it (as minimally and precisely as possible)?

Create an IAM role with permissions for the aws-ebs-csi-driver
Create an EKS cluster with an OIDC connect provider trust relationship with IAM (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html), especially following the final step of disabling pods from accessing the metadata service to prevent them from assuming the worker instance profile.
Deploy aws-ebs-csi-driver using the alpha kustomize overlays, adding the eks.amazonaws.com/role-arn to the service account
The ebs-csi-controller pods will start and crash after about 20s

Anything else we need to know?: As far as we can tell everything is set up correctly with the role + service account, but the code explicitly tries to instantiate the metadata service, which is firewalled off. Can this be made optional if region is set, and credentials are available via the service account?

Environment EKS v1.14 ebs-csi-driver v0.5.0, v0.6.0-dirty

Mar 25 '20 19:03 geofffranks

How is ur AWS_REGION specified? The metadata service is used as a fallback when AWS_REGION is not set through environment variable.

Mar 26 '20 21:03 leakingtapan

@leakingtapan I am having the same issue with the ebs-plugin, it is trying to access the metadata service while an AWS_REGION is defined as an environment variable. I've specified AWS_REGION the following way:

 containers:
        - name: ebs-plugin
          image: amazon/aws-ebs-csi-driver:latest
          args :
          # - {all,controller,node} # specify the driver mode
            - --endpoint=$(CSI_ENDPOINT)
            - --logtostderr
            - --v=5
          env:
            - name: AWS_REGION
              value: eu-central-1
...

If I follow the error log I see it is trying to access the metadata service when creating a new nodeService, see here

I assume when specifying the AWS_REGION variable it should respect that. If that's the case, I am willing to create a PR for that.

Mar 27 '20 13:03 franklevering

We've specified it as such:

          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
          # overwrite the AWS region instead of looking it up dynamically via the AWS EC2 metadata svc
            - name: AWS_REGION
              value: us-east-1

Also have tried AWS_DEFAULT_REGION as the AWS CLI uses that variable name

Mar 27 '20 14:03 geofffranks

I seem to be having the same issue when running it on my OpenShift cluster in AWS. My service account has full admin rights, but still this panics and fails.

Apr 13 '20 22:04 prashantokochavara

@leakingtapan - is this a bug? Or am I missing some env variables somewhere? @geofffranks / @franklevering - were you able to resolve this?

Apr 14 '20 19:04 prashantokochavara

It sounds like a bug if AWS_REGION is specified but not honored. But I haven’t got enough time to root cause the issue

Apr 14 '20 19:04 leakingtapan

I am also facing the issue.

Apr 17 '20 14:04 nitesh-sharma-trilio

hello folks did anyone got a way out of this I am facing this while integrating my OCP with EBS thanks in advance.

Apr 18 '20 15:04 tanalam2411

@leakingtapan -- is it possible to get a rough estimate on when this fix would be available?

Apr 20 '20 23:04 prashantokochavara

Hi Everyone. I've been looking into this issue a bit closer, and can confirm that this is not a misconfiguration, and also not related to the AWS_REGION environment variable being defined or not.

If you follow the stack trace [1], you end up realising that the driver relies heavily on the metadata service to retrieve the current instance id, availability zone for topology-aware dynamic provisioning, and information about the instance family used to derive the maximum number of EBS volumes that could be attached.

The way I see it, keeping in mind I'm not a member of this project, this does not look like a bug that should be fixed, but rather as a requirement of the driver that should be explicitly documented.

For the time being, I'm working around this issue by using a slightly more specific iptables rule leveraging the string extension [2] to filter only packets containing "iam/security-credentials" [3] within their first 100 bytes:

iptables --insert FORWARD 1 --in-interface eni+ --destination 169.254.169.254/32 -m string --algo bm --to 100 --string 'iam/security-credentials' --jump DROP

I'm would not bet on this to ensure that someone who REALLY wants to access this URL is able to do so, but it should help in most cases. Eager to hear if anyone can think of a better solution.

[1] https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go [2] http://ipset.netfilter.org/iptables-extensions.man.html#lbCE [3] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

Apr 24 '20 16:04 mvaldesdeleon

@mvaldesdeleon - where are you running the iptables command exactly? Is this something manually you are setting on the nodes?

Apr 27 '20 17:04 prashantokochavara

@prashantokochavara Martin is referring to the Worker Nodes, where Metadata Endpoint Access is restricted (https://docs.aws.amazon.com/de_de/eks/latest/userguide/restrict-ec2-credential-access.html)

Apr 27 '20 18:04 maxbraun

According to the AWS docs the meta-data endpoint is a link local address which can only be reached from the host. Can it actually be reached from inside a container? When I try it on my cluster I'm able to curl the metadata endpoint from the host itself but I get a "connection refused" when trying the same command from inside the ebs-csi-controller

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

May 20 '20 04:05 dmc5179

So I was able to workaround this issue by disabling the liveness containers and probes for 9808 and then enabling hostNetworking for the csi controller pod

May 20 '20 20:05 dmc5179

@dmc5179 - can you show me your yaml file and how you are making these changes for the controller pod/deployment?

May 21 '20 13:05 prashantokochavara

@prashantokochavara Yes, I'll add them here.

Wanted to note that I found the reason hostNetwork is not needed in vanilla kube is because the OpenShift SDN won't route the link local requests but vanilla kube will. I've created some other feature requests to support this driver on OpenShift.

May 21 '20 13:05 dmc5179

Thanks @dmc5179 I haven't tried it on vanilla k8s yet, but that is what I was expecting. My env is OpenShift 4.3/4.4 and I'm able to get through with static provisioning, but wherever dynamic is required (metadata), i run into those network issues. hopefully your workaround is all I need :)

May 21 '20 14:05 prashantokochavara

Here is a link to my fork of the driver where I modified the helm chart to support OpenShift 4.3. Note that I modified the 0.3.0 chart and ran that on my cluster. The git repo is version 0.4.0 of the helm chart. I don't see any reason why my modifications would not work. That being said, if you need to use the 0.3.0 version of the chart, take the changes that I made in my git repo and apply them to the deployment.yaml and daemonset.yaml files in the 0.3.0 version of the chart. Let me know if that makes any sense.

https://github.com/dmc5179/aws-ebs-csi-driver

Another member of our team tried this modification in AWS commercial and it worked.

Note that there is one additional modification in my version of the chart. Because I'm deploying in a private AWS region I need to add certificates to support the custom API endpoint. I could not find anyway to get them into the CSI driver containers, long story. What I ended up doing is a hostPath mount to /etc/pki which works. If you do not want that host mount and/or don't need it, just comment it out in the files that I changed in my version of the driver.

May 21 '20 17:05 dmc5179

Thanks @dmc5179 Appreciate it! I am going to try this on the 0.5.0 version of the helm chart (need snapshots/expansion).

May 21 '20 17:05 prashantokochavara

@dmc5179 - I was finally able to get past the metadata issue. Thanks!

In addition to the changes that you had in your fork, I also had to disable the liveness container and pod that you had mentioned earlier. I basically commented those parts in the node and controller .yaml files.

Adding them here in case anyone else needs it. aws-ebs-csi-driver.zip

Jun 02 '20 19:06 prashantokochavara

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Aug 31 '20 20:08 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Sep 30 '20 21:09 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Oct 30 '20 21:10 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 30 '20 21:10 k8s-ci-robot

Requiring metadata kinda sucks considering the recommended EKS configuration with IRSA it to disable metadata api access..

Mar 02 '21 06:03 nhoughto

/reopen /help-wanted /lifecycle frozen

Node service needs metadata to discover instance id, and topology (outpost, zone, region) info https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/7278cef1d7a1a63a43f733bfe237e1222678e7c3/pkg/cloud/metadata.go#L91 Maybe with downward API some of this could be retrieved from Node object.

Controller service needs it to discover region. Maybe it could use AWS API instead since it will need AWS API permissions anyway.

Also, there is now an instance metadata v2 and I haven't tested it

Mar 02 '21 18:03 wongma7

@wongma7: Reopened this issue.

In response to this:

/reopen /help-wanted /lifecycle frozen

Node service needs metadata to discover instance id, and topology (outpost, zone, region) info https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/7278cef1d7a1a63a43f733bfe237e1222678e7c3/pkg/cloud/metadata.go#L91 Maybe with downward API some of this could be retrieved from Node object.

Controller service needs it to discover region. Maybe it could use AWS API instead since it will need AWS API permissions anyway.

Also, there is now an instance metadata v2 and I haven't tested it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 02 '21 18:03 k8s-ci-robot

For node plugin, three scenarios:

1 IMDS is accessible from kubelet and containers

CSI works, we assume this today. We put hostNetwork in our YAMLs to ensure that container can reach metadata endpoint on host.

2 IMDS is accessible from kubelet but NOT from containers

CSI doesn't work. Container needs to discover its zone and instance ID somehow. The only hint it might have is the node name/IP from kubelet via downward API which we can assume matches the EC2 node's internal DNS name. (If kubelet has in-tree cloudprovider enabled, it will have been retrieved from instance metadata and will match). In theory then the container could call DescribeInstances with this internal DNS name to discover the rest of the info. But if the environment has restricted instance metadata access already, asking for even more API permissions seems unreasonable.

3 IMDS is NOT accessible from kubelet NOR from container (IMDS is disabled https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).

Same as above, except in this case, even kubelet.nodeName might be unreliable, because in-tree cloudprovider will fail to get metadata response and it must be derived from the OS hostname which is not guaranteed to match the internal DNS name unless user sets it. However then the Node shouldn't even be allowed to join cluster so, again, from CSI driver perspective, we can assume that kubelet.nodeName==internal DNS name. (Really then this is equivalent to scenario 2 from CSI driver perspective and I describe it only to be exhaustive.)

Two additional variables to consider for each scenario:

1. Migration.

In scenario 1 both the in-tree and CSI driver works fine. In scenario 2/3 in-tree works but not CSI. So we are breaking compatibility for scenario 2/3.

2. External cloud provider.

This is really just another technicality, but in this case, kubelet doesn't even try to access instsance metadata. It is assumed that its node name will match internal DNS name somehow, without the aid of metadata, and then external cloud provider will fill in kubelet's providerID https://github.com/kubernetes/cloud-provider-aws/issues/72

Possible solutions:

Use downward API to retrieve node name, safely assume that it is equal to internal DNS name, then call DescribeInstances to retrieve the rest.
- this doesn't solve any migration compatibilit yissue because we are still requiring MORE than what the in-tree did. we are either requiring instance metadata access or DescribeInstances API access.
Use downward API to retrieve node name, then call k8s API Get on it to retrieve providerId which will include zone and instance ID, assuming that cloud node controller has set this. RBAC to allow Get node will be auto-installed by helm. Today kubelet/in-tree driver has read/write access to its own node object so this is not in any way a breaking migration change. TY msau42@
Augment downward API to include providerId which will include zone and instance ID (but does it include also outpost info??) https://github.com/kubernetes/kubernetes/issues/99285 . Basically same as above but we skip the Get node call.
- for outposts, there is no concern of migration breaking compatibility since in-tree never even supported outpost provisioning
Add fallback for people to explicitly set info that cannot be automatically retrieved. Same as how kubelet you can control its node name to be equal to internal DNS in case metadata is not accessible.

(By the way, IMDSv2 has no bearing on any of this, I wasn't sure but basically we just need to be on a recent SDK version and it will "just work")

Mar 19 '21 19:03 wongma7

For controller plugin, we only need metadata to get region. And we already provide a fallback for that via AWS_REGION env variable, so it's not as pressing an issue.

Otherwise, we have to choose one of the same approaches as above.

As mentioned above, DescribeInstances approach would assume that node name is equal to internal DNS name which could change in a future version of cloud provider.

Get node approach would assume that cloud provider has set provider ID.

Neither assumption is guaranteed to hold (since a user can override kubelet's node name or provider id, run some fork of aws cloud-provider, etc.) but is better than just crapping out when instance metadata is unavailable.

Mar 24 '21 19:03 wongma7

We've run into this issue and can confirm that even with the AWS_REGION variable set, this still fails in an environment with IMDSv2 and 1 hop (AWS EKS Security best-practice). This is due to: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go#L85

This is with v0.10.0

Apr 06 '21 02:04 groodt

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Cannot Run with IAM Service Account and no metadata service

1 IMDS is accessible from kubelet and containers

2 IMDS is accessible from kubelet but NOT from containers

3 IMDS is NOT accessible from kubelet NOR from container (IMDS is disabled https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).

1. Migration.

2. External cloud provider.

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard