aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard
efs-csi-controller won't start if IMDS access is blocked
/kind bug
What happened?
With IMDS disabled per best practices (https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html) on Bottlerocket hosts, pods from the efs-csi-controller deployment will not start.
We need something similar for the controller, or for it to just not need IMDS access to begin with.
F0127 18:13:01.145009 1 driver.go:54] could not get metadata from AWS: EC2 instance metadata is not available
is emitted to the log and a crash occurs.
What you expected to happen?
I expected efs-csi-controller to start. Passing the region/instance ID/other IMDS-sourced information would be acceptable.
How to reproduce it (as minimally and precisely as possible)?
- Block IMDS access
- Deploy
efs-csi-controller
Anything else we need to know?:
The DaemonSet uses hostNetwork: true to regain access to the IMDS (https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/188)
Environment
- Kubernetes version (use
kubectl version): EKS 1.18 - Driver version:
master
I was able to get the efs-csi-controller running by removing the liveness check and switching to hostNetwork: true until the IMDS dependency can be resolved.
Yes, seems we'll have to add hostNetwork true, thank you for trying and verifying it fixes it.
(I am looking into ways to avoid talking to instance metadata altogther since we only use it for super basic stuff like instance id but not even sure if it's feasible yet)
Tried at on premises physical servers environment (not AWS env), and it still throws errors below. Any extra configuration need be be done for this scenario? Thanks.
could not get metadata from AWS: EC2 instance metadata is not available
@korbin Hi Korbin~ Got the same issue for on premises physical servers Kubernetes environment. For this workaround, tried to remove livenessProbe part for container efs-plugin in controller deployment (https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/deploy/kubernetes/base/controller-deployment.yaml), and the error still exist. Also need to remove the container liveness-probe? Thanks.
I was able to get the
efs-csi-controllerrunning by removing the liveness check and switching tohostNetwork: trueuntil the IMDS dependency can be resolved.
I've also been running into this.
Another thing to consider, is how to ensure the ports used by the aws-efs-csi-driver do not conflict with the ports used by aws-ebs-csi-driver. Both of these applications seem to use a similar approach, where there is a Deployment and a DaemonSet which require hostNetwork and hostPort to function correctly when IMDS access is blocked.
@groodt yes the poor choice of default port definitely needs fixing: https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/437/files
regarding the need for instance metadata in general we arrived at a fix in ebs and will probably copy the fix over here https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/855. The tradeoff is that the driver will need permission to Get Nodes. But that's a read-only permission and can come included in the RBAC artifacts, it wont require any extra work on part ofusers
Thanks! I think that sharing a common approach with the EBS driver makes sense if possible. I think normalising the use of IRSA where possible can only be a good thing, particularly for the AWS provided add-ons and utilities.
@wongma7 Thanks for making progress on the ebs-csi-driver https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/821 I've been able to successfully remove access to the hostNetwork for the controller. Any updates on the similar approach for the efs-csi-driver?
I would love to remove hostNetwork access for both EFS and EBS (node and controllers: 4 workloads total). So far, I've only been able to remove hostNetwork for the ebs-csi-controller. (1/4 workloads).
I have some updates here. I can confirm that aws-ebs-csi-driver as of v1.3.0 is able to run successfully without hostNetwork using IRSA. https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/821#issuecomment-923413504
@wongma7 Is it reasonable to expect that the same will be possible with the aws-efs-csi-driver in future?
yes, that is totally reasonable, the EFS driver needs to be able to run without hostnetwork/imds for exactly the same reasons as EBS. The effort entails copying the code and test (an end-to-end test on a "real" EKS cluster with nodes whose IMDS is disabled) from EBS to here. I don't have an ETA but that is my plan
That sounds awesome! I'll follow this issue for any updates. 🚀
@wongma7 Any updates on this issue ? Really looking forward removing hostNetworking ... ;-)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Have raised a PR that I think should resolve this issue here: https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/681
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.