k8s-cloudwatch-adapter icon indicating copy to clipboard operation
k8s-cloudwatch-adapter copied to clipboard

Adapter pod not able to connect to Ec2Metadata

Open zeu5 opened this issue 4 years ago • 7 comments

I'm running the CloudWatch adapter with IRSA configured. While other pods running on the node are able to access the metadata API, the pod running CloudWatch adapter throws the following error

client.go:97] err: EC2RoleRequestError: no EC2 instance role found
caused by: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/iam/security-credentials: dial tcp 169.254.169.254:80: connect: connection refused
E0804 12:32:13.788312       1 provider_external.go:31] bad request: EC2RoleRequestError: no EC2 instance role found

We have set up IRSA with the required permissions. Since it is not able to connect to ec2metadata API, the region is also not picked

I0805 07:49:09.832766       1 controller.go:57] initializing controller
E0805 07:49:09.832972       1 util.go:14] unable to get current region information, Get http://169.254.169.254/latest/meta-data/placement/availability-zone/: dial tcp 169.254.169.254:80: connect: connection refused
I0805 07:49:09.832988       1 client.go:26] using AWS Region:

Setting AWS_REGION does not solve the issue either

zeu5 avatar Aug 04 '20 13:08 zeu5

Hi @zeu5 which version of the adapter are you using?

chankh avatar Aug 07 '20 03:08 chankh

We encounter this error when using a much older version v0.2.0. The region gets picked up but later on when the adapter is trying to pull in metrics we encounter this error.

client.go:97] err: EC2RoleRequestError: no EC2 instance role found
caused by: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/iam/security-credentials: dial tcp 169.254.169.254:80: connect: connection refused
E0804 12:32:13.788312       1 provider_external.go:31] bad request: EC2RoleRequestError: no EC2 instance role found

The region does not get picked up when we use the latest image chankh/k8s-cloudwatch-adapter:v0.9.0

I0805 07:49:09.832766       1 controller.go:57] initializing controller
E0805 07:49:09.832972       1 util.go:14] unable to get current region information, Get http://169.254.169.254/latest/meta-data/placement/availability-zone/: dial tcp 169.254.169.254:80: connect: connection refused
I0805 07:49:09.832988       1 client.go:26] using AWS Region:

zeu5 avatar Aug 10 '20 05:08 zeu5

Hi @zeu5, if you are using IRSA, please try v0.8.0 because previous versions used AWS Go SDK v2 which lacks support for Web Identity Tokens, ref #19.

Also may I know if you are running the adapter on Fargate or EC2 worker node?

chankh avatar Aug 11 '20 05:08 chankh

Hi @chankh, we tried using v0.8.0 as you mentioned. We are running adapter on EC2 worker node. Setting AWS_REGION environment is not working.

E0805 07:49:09.832972       1 util.go:14] unable to get current region information, Get http://169.254.169.254/latest/meta-data/placement/availability-zone/: dial tcp 169.254.169.254:80: connect: connection refused
I0805 07:49:09.832988       1 client.go:26] using AWS Region:```

Shivam9268 avatar Aug 11 '20 08:08 Shivam9268

Do you have anything that blocks the container from calling EC2 metadata API? The adapter retrieves the region ID using the EC2 metadata API at http://169.254.169.254 and now it's getting connection refused. Probably validating that connection first.

How did you set the AWS_REGION? It should work given that is the default behavior from the SDK.

chankh avatar Aug 12 '20 08:08 chankh

Hi @chankh, we had set the AWS_REGION by passing it as an environment variable in the deployment. Also the adapter is unable to read the token file available in the service account. It gives the error: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied Setting the security context for pod as:

securityContext:
    fsGroup: 65534

inside the spec of container solves the issue. I have raised a PR #46 for the same. Please have a look at it.

We have currently solved the AWS_REGION issue by getting the region using os.LookupEnv() inside the code.

Shivam9268 avatar Aug 15 '20 08:08 Shivam9268

You are right, I forgot to commit the changes after adding the security context for my local deployment. Thanks for submitting that PR

chankh avatar Aug 17 '20 08:08 chankh