containers-roadmap
containers-roadmap copied to clipboard
[EKS] [request]: Better support for removing instance metadata endpoint access
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
Removing access to the instance metadata endpoint is documented as good security posture in your documentation: https://docs.aws.amazon.com/eks/latest/userguide/restrict-ec2-credential-access.html. However, there are a couple of improvements that could be made here:
- (less important): This feels like it could be a checkbox somewhere in EKS. I'm not sure if I should have to use a custom launch template with userdata to achieve this.
- (more important): Performing the actions in the documentation linked above (with a custom launch template and custom userdata) stops the Amazon Cloudwatch agent from working. Logs below:
2020/09/02 06:07:45 I! 2020/09/02 06:07:42 E! ec2metadata is not available
2020/09/02 06:07:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2020/09/02 06:07:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2020/09/02 06:07:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/..2020_09_02_04_57_01.343707504/cwagentconfig.json ...
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/..data
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded
2020/09/02 06:07:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2020/09/02 06:07:45 I! AmazonCloudWatchAgent Version 1.245315.0.
2020-09-02T06:07:45Z I! will use file based credentials provider
2020-09-02T06:07:45Z I! Starting AmazonCloudWatchAgent (version 1.245315.0)
2020-09-02T06:07:45Z I! Loaded outputs: cloudwatchlogs
2020-09-02T06:07:45Z I! Loaded inputs: cadvisor k8sapiserver
2020-09-02T06:07:45Z I! Tags enabled:
2020-09-02T06:07:45Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-21-189-66.ap-southeast-2.compute.internal", Flush Interval:1s
2020-09-02T06:07:45Z I! k8sapiserver Switch New Leader: ip-172-21-188-238.ap-southeast-2.compute.internal
2020-09-02T06:08:06Z E! ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
I suspect the Cloudwatch Agent needs access to the instance metadata service. However it feels like it shouldn't, and should be able to collect the information it needs via other IRSA permissions, and possibly by looking at the labels on the node (if all it really needs is the instance id)
Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Trying to run the Cloudwatch Agent using IRSA, in a cluster with a strong security posture that has had the instance metadata endpoint disabled for use by pods.
Are you currently working around this issue? You would need to use something else for network policy enforcement, such as Calico. However this comes with extra overhead of managing another cluster service, and deficiencies with the deployment of Calico resources (having to use calicoctl instead of kubectl for some resources).
As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit
https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html
We do plan to add this as a checkbox option to managed groups in the future
Is there an example on how we can add this for managed groups for now during bootstrapping (especially using eksctl)?
for EKS managed groups you may want to set eksctl CLI option --disable-pod-imds or config option disablePodIMDS
Experience the same issue on cloudwatch agents for EKS when enabling IMDSv2 with token hop limit to 1 on the launch template as specified from the best practice documentation. We set this to token hop limit 2 for now, which feels like it goes against what it recommended.
@sharkztex Did you also configure the iptables? I couldn't get it to work when using the iptables.
@gfvirga No we didn't as we don't have a requirement to access IMDSv1.
Any update on how to run the CloudWatch Agent with IMDS? We also have it disabled but want to run the CloudWatch Agent and are getting the same issue.
is there any news about this issue? I am unable to enable cloudwatch metrics in my EKS cluster
I've been finally able to fix this issue, i found the right configuration.
Since i'm using terraform to provision the EKS cluster and the managed node groups, i had to add the following configuration for the aws_launch_template resource:
metadata_options {
http_endpoint = "enabled"
#the following two configuration are fixing https://github.com/aws/containers-roadmap/issues/1060
http_tokens = "optional"
http_put_response_hop_limit = 2
}
After that, in case you're using bottle rocket you have to set the right socket path on the cloud watch pods.
Since i'm using the helm chart aws-cloudwatch-metrics i had to add the following configuration according to this issue https://github.com/aws/amazon-cloudwatch-agent/issues/188):
"containerdSockPath" = "/run/dockershim.sock"
@bersanf did you also configured iptables?
As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit
https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html
We do plan to add this as a checkbox option to managed groups in the future
@mikestef9 are there still plans to expose IMDS customization/restriction in the managed node groups API? I'm looking for a means to restrict down to best practice (tokens required, hop limit 1) without needing to define a custom launch template.
Any updates?
We disabled IMDS as per best practice:
Restrict access to the instance profile assigned to the worker node
But we bumped hop count to 2 because apparently some application needs IMDS:
As a consequence, I assume, mkat complains that IMDSv2 is accessible:
IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role
How to improve?
% mkat eks test-imds-access
Connected to EKS cluster my-cluster
Testing if IMDSv1 and IMDSv2 are accessible from pods by creating a pod that attempts to access it
IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role
IMDSv1 is not accessible to pods in your cluster: able to establish a network connection to the IMDS, but no credentials were returned