containers-roadmap [EKS] [request]: Better support for removing instance metadata endpoint access

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request

Removing access to the instance metadata endpoint is documented as good security posture in your documentation: https://docs.aws.amazon.com/eks/latest/userguide/restrict-ec2-credential-access.html. However, there are a couple of improvements that could be made here:

(less important): This feels like it could be a checkbox somewhere in EKS. I'm not sure if I should have to use a custom launch template with userdata to achieve this.
(more important): Performing the actions in the documentation linked above (with a custom launch template and custom userdata) stops the Amazon Cloudwatch agent from working. Logs below:

2020/09/02 06:07:45 I! 2020/09/02 06:07:42 E! ec2metadata is not available
2020/09/02 06:07:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2020/09/02 06:07:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020/09/02 06:07:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2020/09/02 06:07:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/..2020_09_02_04_57_01.343707504/cwagentconfig.json ...
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/..data
2020/09/02 06:07:45 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
2020/09/02 06:07:45 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded

2020/09/02 06:07:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2020/09/02 06:07:45 I! AmazonCloudWatchAgent Version 1.245315.0.
2020-09-02T06:07:45Z I! will use file based credentials provider
2020-09-02T06:07:45Z I! Starting AmazonCloudWatchAgent (version 1.245315.0)
2020-09-02T06:07:45Z I! Loaded outputs: cloudwatchlogs
2020-09-02T06:07:45Z I! Loaded inputs: cadvisor k8sapiserver
2020-09-02T06:07:45Z I! Tags enabled:
2020-09-02T06:07:45Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-21-189-66.ap-southeast-2.compute.internal", Flush Interval:1s
2020-09-02T06:07:45Z I! k8sapiserver Switch New Leader: ip-172-21-188-238.ap-southeast-2.compute.internal
2020-09-02T06:08:06Z E! ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance

I suspect the Cloudwatch Agent needs access to the instance metadata service. However it feels like it shouldn't, and should be able to collect the information it needs via other IRSA permissions, and possibly by looking at the labels on the node (if all it really needs is the instance id)

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Trying to run the Cloudwatch Agent using IRSA, in a cluster with a strong security posture that has had the instance metadata endpoint disabled for use by pods.

Are you currently working around this issue? You would need to use something else for network policy enforcement, such as Calico. However this comes with extra overhead of managing another cluster service, and deficiencies with the deployment of Calico resources (having to use calicoctl instead of kubectl for some resources).

Sep 04 '20 01:09 Niksko

As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit

https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html

We do plan to add this as a checkbox option to managed groups in the future

Nov 16 '20 05:11 mikestef9

Is there an example on how we can add this for managed groups for now during bootstrapping (especially using eksctl)?

Jan 03 '21 03:01 sukrit007

for EKS managed groups you may want to set eksctl CLI option --disable-pod-imds or config option disablePodIMDS

May 23 '21 19:05 avoidik

Experience the same issue on cloudwatch agents for EKS when enabling IMDSv2 with token hop limit to 1 on the launch template as specified from the best practice documentation. We set this to token hop limit 2 for now, which feels like it goes against what it recommended.

Jun 23 '21 08:06 sharkztex

@sharkztex Did you also configure the iptables? I couldn't get it to work when using the iptables.

Jun 23 '21 15:06 gfvirga

@gfvirga No we didn't as we don't have a requirement to access IMDSv1.

Jun 24 '21 08:06 sharkztex

Any update on how to run the CloudWatch Agent with IMDS? We also have it disabled but want to run the CloudWatch Agent and are getting the same issue.

Jul 06 '21 23:07 apanzerj

is there any news about this issue? I am unable to enable cloudwatch metrics in my EKS cluster

Jun 30 '22 08:06 bersanf

I've been finally able to fix this issue, i found the right configuration.

Since i'm using terraform to provision the EKS cluster and the managed node groups, i had to add the following configuration for the aws_launch_template resource:

  metadata_options {
    http_endpoint = "enabled"
    #the following two configuration are fixing https://github.com/aws/containers-roadmap/issues/1060 
    http_tokens   = "optional"
    http_put_response_hop_limit = 2
  }

After that, in case you're using bottle rocket you have to set the right socket path on the cloud watch pods. Since i'm using the helm chart aws-cloudwatch-metrics i had to add the following configuration according to this issue https://github.com/aws/amazon-cloudwatch-agent/issues/188): "containerdSockPath" = "/run/dockershim.sock"

Jun 30 '22 09:06 bersanf

@bersanf did you also configured iptables?

Jul 25 '22 12:07 vsantoshaws

As an update here, we have updated our docs with a simpler option to disable IMDS for pods by using IMDSv2 and the hop limit

https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html

We do plan to add this as a checkbox option to managed groups in the future

@mikestef9 are there still plans to expose IMDS customization/restriction in the managed node groups API? I'm looking for a means to restrict down to best practice (tokens required, hop limit 1) without needing to define a custom launch template.

Oct 26 '22 17:10 orirawlings

Any updates?

We disabled IMDS as per best practice:

Restrict access to the instance profile assigned to the worker node

But we bumped hop count to 2 because apparently some application needs IMDS:

When your application needs access to IMDS, use IMDSv2 and increase the hop limit on EC2 instances to 2

As a consequence, I assume, mkat complains that IMDSv2 is accessible:

IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role

How to improve?

% mkat eks test-imds-access
Connected to EKS cluster my-cluster
Testing if IMDSv1 and IMDSv2 are accessible from pods by creating a pod that attempts to access it
IMDSv2 is accessible: any pod can retrieve credentials for the AWS role my-cluster-node-group-role
IMDSv1 is not accessible to pods in your cluster: able to establish a network connection to the IMDS, but no credentials were returned

May 16 '24 18:05 joebowbeer

containers-roadmap containers-roadmap copied to clipboard

[EKS] [request]: Better support for removing instance metadata endpoint access

Community Note

containers-roadmap
containers-roadmap copied to clipboard