aws-sdk-js
aws-sdk-js copied to clipboard
Intermittent "EC2 Metadata roleName request returned error" (EINVAL) on ECS Fargate
- [x] I've gone through Developer Guide and API reference
- [x] I've checked AWS Forums and StackOverflow for answers
- [x] I've searched for previous similar issues and didn't find any solution
Describe the bug I am running a node 12.16 app on ECS Fargate. It's performing operations on files in S3 - streaming from a source bucket and uploading to a destination bucket. About 5 hours ago I started to see the following error when uploading to the destination bucket:
"originalError": {
"message": "Could not load credentials from any providers",
"errno": "EINVAL",
"code": "CredentialsError",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"time": "2020-05-28T14:32:43.621Z",
"originalError": {
"message": "EC2 Metadata roleName request returned error",
"errno": "EINVAL",
"code": "EINVAL",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"time": "2020-05-28T14:32:43.620Z",
"originalError": {
"errno": "EINVAL",
"code": "EINVAL",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"message": "connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)"
}
}
}
It happened for several minutes and then stopped. Then happened again for a couple minutes about an hour ago and stopped. So it's intermittent. This seems very similar to what was reported in https://github.com/aws/aws-sdk-js/issues/2534#issuecomment-465308420 and asked on the forum here, but has received no answer. I'm using a task role that has PUT
permissions on the destination bucket. As I said, this is intermittent so when it's not happening, everything is working as it should. For some reason, it seems that there is an issue pulling credentials from the metadata service.
I'm going to update the SDK to the latest to see if that resolves it but I didn't see anything in the changelog that would indicate it would. Any guidance would be greatly appreciated. Thanks!
Is the issue in the browser/Node.js? Node.js
If on Node.js, are you running this on AWS Lambda? No
SDK version number v2.647.0
Hey @summera thank-you for reaching out to us, while this is very hard to reproduce, is it possible to explicitly set your credentials so that it doesn't touch the metadata depending upon your use case, I understand that should not be the workaround but I would need something more concrete to show to the service team, something which might be reproducible.
Would you be able to share your logs?
Hi @ajredniwja. Thank you for the response. After the issue occurred, I updated the SDK to 2.685.0
. I also realized that the issue happened during a spike in requests so I scaled up the minimum tasks by one. Since then, I haven't seen the issue occur again. The JSON I included in my first comment (https://github.com/aws/aws-sdk-js/issues/3284#issue-626634869) is coming straight from my logs. Is there something else you were looking to see from the logs?
As for reproducing, I haven't seen this happen since upgrading and scaling up our minimum tasks. However, since this happened during high load when a lot of requests came in and therefore many parallel uploads to S3, I'm wondering if one or more of the following may be possibilities?
- Metadata service in Fargate failed to respond under high load for one reason or another.
- The SDK is or was not caching credentials retrieved from the metadata service and was therefore hitting the metadata service more than necessary and bombarding it with requests.
- Some transient issue happened with the Fargate service and has been resolved.
Do any of the above sound plausible?
- Metadata service in Fargate failed to respond under high load for one reason or another.
- The SDK is or was not caching credentials retrieved from the metadata service and was therefore hitting the metadata service more than necessary and bombarding it with requests.
- Some transient issue happened with the Fargate service and has been resolved.
Do any of the above sound plausible?
I cannot point you towards any of those with complete certainty because we dont have any concrete evidence.
Can you use the following and collect logs for both the cases, in that way we can compare and come to some conclusion
NODE_DEBUG=cluster,net,http,fs,tls,module,timers node app.js
I cannot point you towards any of those with complete certainty because we dont have any concrete evidence.
Makes sense, though I was only asking about plausibility. If any of those are not plausible, it makes it easier to focus efforts.
Can you use the following and collect logs for both the cases, in that way we can compare and come to some conclusion
Which two cases are you referring to exactly?
Which two cases are you referring to exactly?
I was talking about case where you see the error and the case where you don't, but I think that might be very hard to catch since this is intermittent error.
I was talking about case where you see the error and the case where you don't, but I think that might be very hard to catch since this is intermittent error.
Yea, as I mentioned above, I haven't seen this happen since updating the SDK and increasing the minimum ECS tasks by one, so I don't have any logs to share of this happening again. The fact that it was intermittent and is hard to reproduce is why I was asking what might be plausible to see if it's worth descending the rabbit hole and spending time to investigate further.
Hi everyone, having exactly the same issue @summera reported with almost the same setup. Very intermittent, have 10-15 clusters, receiving a few thousand requests, and the issue seems to raise once every week, so very rare! Had to set cloudwatch alarms with log filter to get those. So, monitoring very closely.
ECS Task, fargate managed, nodejs 13 image built from node:13.10-alpine
, task deployed through CF, have a few ENVs set (nothing new). At code level, using aws-sdk 2.701.0
and usually as my first access is on DynamoDB, the issue arises when querying dynamo.
The weirdest thing is that the issue raises into a task that is running for quite a long time, and in the middle of a bunch of successful requests. That said, I would eliminate any configuration issue, but not SDK; however, the clues (for me) point to ECS metadata service being unavailable for some reason.
One detail is that we use New Relic on some apps, so the trace is faulted for debugging purposes.
Any thoughts?
ckcpb72fh02l401x5470g8ctn-ckcpb72fh02l501x53ikgcj1p [ERROR] [] CredentialsError: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 - Error: connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)
at internalConnect (net.js:921:16)
at defaultTriggerAsyncIdScope (internal/async_hooks.js:313:12)
at net.js:1011:9
at Shim.applySegment (/usr/src/httpd/node_modules/newrelic/lib/shim/shim.js:1430:20)
at wrapper (/usr/src/httpd/node_modules/newrelic/lib/shim/shim.js:2092:17)
at processTicksAndRejections (internal/process/task_queues.js:79:11)
Hi, I am following this doc(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html ) to select region dynamically in aws. And I tried to test the code in aws ecs fargate it gives me below error
{ Error: connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)
at internalConnect (net.js:882:16)
at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
at defaultTriggerAsyncIdScope (net.js:972:9)
at process._tickCallback (internal/process/next_tick.js:61:11)
errno: 'EINVAL',
code: 'EINVAL',
syscall: 'connect',
address: '169.254.169.254',
port: 80
}
However, it runs perfectly on ecs ec2 task. I use "aws-sdk": "^2.701.0". It's js code in a docker container. Any solution appretiated.
Few occurrences this week. @ajredniwja do you believe is better to open an internal ticket for this? Getting worried.
Yes. Let’s open a ticket. We will need to subscribe to dev support no prod account deles. Acho que podes fazer isso usando teu role senão usa o root account. Podes fazer isso por favor?
On Fri, Jul 31, 2020 at 9:37 AM Gabriel Pacheco [email protected] wrote:
Few occurrences this week. @ajredniwja https://github.com/ajredniwja do you believe is better to open an internal ticket for this? Getting worried.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-js/issues/3284#issuecomment-667218206, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRW6VZKEHRCQ7G265KKO7TR6LXNBANCNFSM4NNHE2ZQ .
Getting same issue @ajredniwja
Same issue here, definitely think it has something to do with ECS Fargate; although, it does work on some of my S3 put object requests. I tried to disable this request w/AWS_EC2_METADATA_DISABLED
, but the error still happens but now it is:
CredentialsError: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
I don't use AWS_*
evn vars for credentials, since the ECS Fargate task has access to S3 via my task's IAM role.
Using AWS_ACCESS_KEY_ID
/AWS_SECRET_ACCESS_KEY
env vars to use an IAM User works; but, I should be able to rely on the IAM role built into the ECS Task.
I keep getting this issue even though my configuration is correct.
Same issue here. NodeJS running on fargate. SDK version 2.745.0
Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
Seeing this exact issue as well. IAM role needs to be fixed
Bump. We are seeing this too. ECS/Fargate and node.
Error: ENOENT: no such file or directory, open '/root/.aws/config' at Object.openSync (fs.js:440:3) at /usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:91:53 at /usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:43:56 at Scope._activate (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:51:14) at Scope.activate (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19) at DatadogTracer.trace (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:43:35) at Object.openSync (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:91:23) at Object.readFileSync (fs.js:342:35) at /usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:91:53 at /usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:43:56 at Scope._activate (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:51:14) at Scope.activate (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19) at DatadogTracer.trace (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:43:35) at Object.readFileSync (/usr/src/app/node_modules/dd-trace/packages/dd-trace/src/tracer.js:91:23) at Object.readFileSync (/usr/src/app/node_modules/aws-sdk/lib/util.js:95:26) at IniLoader.parseFile (/usr/src/app/node_modules/aws-sdk/lib/shared-ini/ini-loader.js:6:47) at IniLoader.loadFrom (/usr/src/app/node_modules/aws-sdk/lib/shared-ini/ini-loader.js:56:30) at isEndpointDiscoveryApplicable (/usr/src/app/node_modules/aws-sdk/lib/discover_endpoint.js:299:58) at Request.discoverEndpoint (/usr/src/app/node_modules/aws-sdk/lib/discover_endpoint.js:328:8) at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:102:18) at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/request.js:683:14) at Request.transition (/usr/src/app/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/usr/src/app/node_modules/aws-sdk/lib/state_machine.js:14:12) at /usr/src/app/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:685:12) at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
I had the same problem. It cost me quite some head ache because I had this running in AWS Fargate and debugging is not that easy there.
The error means the Javascript SDK can not find the AWS credentials. If nothing is configured the SDK tries to load the credentials from different places. Here you can see in what order the SDK tries to load the credentials: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html
My error was quite embarrassing, I just had a typo in my environment variables. My variable was AWS_ACCESSS_KEY_ID
instead of AWS_ACCESS_KEY_ID
. (Quite hard to see the difference, right?)
So probably double check the names of your environment variables (or config files)
@antonpirker you're supposed to be able to pass an IAM role to a Task's containers in ECS, meaning you should be able to use the Node SDK w/o relying on access/secret IAM keys.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
I encountered the same error, And I have been trying to fix this error. As I guess, Does the ENI temporally or (consistently) down? I focused on 169,254.... IP address in that error. In my case, when the error happened at once, other AWS API calls (not only s3 put ) also did the same behavior. I'll try to confirm this my assuming.
Hey everyone, if there is a reproducible case can you please share it, the internal ticket was opened for the same but there was no reproducible case provided. Seems like to happen under high memory/cpu usage, retrying the request should be considerable.

I can either enable or disable my AWS_CONFIG_FILE with the same result. I'm also using AWS.config.update() to update my credentials every time my lambda runs. So I have credentials in both the recommended credentials file and I'm explicitly updating them on the fly to something that worked last week. I'm trying to trigger a lambda from my invocation lambda. In short, PHP sends a cURL request to invokeLambda then the invoker triggers a cron lambda to run instantly. I'm attempting to run all of this locally and it worked in the past, but I haven't found a reason that enabled it to work based on the current issue I'm encountering. I wouldn't consider it intermittent, but something takes place where AWS can load the credentials properly. I think I got lucky by doing specific unknown action versus it magically gets the credentials or it doesn't. @ajredniwja I can hop on a call and we can do debugging together if necessary.
Update: The lambda system does work in the AWS test environment, this issue only occurs for me locally.
I also took another route trying SQS/SNS locally. Got all the streams and connection points tied together using AWS CLI.
@ajredniwja I'm able to reproduce this in at least two different ways now.
FYI, it may not be a reasonable solution for all, but I confirmed that ECS Fargate works just fine using the v3 AWS Node SDK which came out in General Availability on 12/15: https://aws.amazon.com/blogs/developer/modular-aws-sdk-for-javascript-is-now-generally-available/ https://github.com/aws/aws-sdk-js-v3
I'll follow up with the fix for me;
I needed to explicitly set up aws configure
in the docker image. Even though my container had all the /.aws/ contents copied over, it wasn't enough for the AWS-SDK to pick it up 'magically'. I suggest ensuring your environment has a profile configured explicitly in the place where you're running your function through the AWS-CLI. This solution resolved the issue for both HTTP and SNS/SQS.
I use environment variables to pass in the AWS keys and following the naming convention from their docs solved the problem for me. SDK will automatically detect and load the environment variables:
-
AWS_ACCESS_KEY_ID
-
AWS_SECRET_ACCESS_KEY
-
AWS_SESSION_TOKEN
Reference: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-node-credentials-environment.html
Docker Image: node:14.15.4-buster aws-sdk: 2.789.0
Still having this issue with 2.876.0. is there a way to install aws-sdk v3 via npm?
UPDATE: I fixed it with setting task_role_arn
in aws_ecs_task_definition
We upgraded from NodeJS 12 to 14 and had a successful run after that. We cannot say whether this is just coincidental or whether it is due to the new NodeJS version.
UPDATE: The problem appeared again, so NodeJS 14 is not the solution. 😞

docker node version: node:14.15.4-buster aws-sdk: 2.940.0 still not work for me..
UPDATE: I work it on!! I'm using docker-compose, so I try setting volumes in my docker-compose.yml file, and it works.
volumes:
- /home/ubuntu/.aws:/root/.aws
-> outside the container : container itself. so inside the container will lead to ~./aws/credentials hope it also works for you.
![]()
i run my code fine on my computer, but get this error when i'm using EC2 docker node version: node:14.15.4-buster aws-sdk: 2.940.0 still not work for me..
UPDATE: I work it on!! I'm using docker-compose, so I try setting volumes in my docker-compose.yml file, and it works.
volumes: - /home/ubuntu/.aws:/root/.aws
-> outside the container : container itself. so inside the container will lead to ~./aws/credentials hope it also works for you.
Hey, just wanted to say that your AWS creds are visible in your image. I recommend revoking them :)
Also, I'm having the same issue as you. Only I'm running an EKS cluster on Fargate and am getting this issue with my pods. I don't run into this issue on an EC2 Node Group though.
*** Update
In my case, we were using Terraform to provision everything in AWS. We use Fargate and IRSA to give our containers permission. What ended up being the issue was that when you create an EKS cluster and an Identity Provider, Terraform will not populate the thumbprint list for the identity provider. We ended up having to populate it ourselves with a TLS certificate.
If you create everything through the AWS management console the thumbprint list is populated automatically for you.
So basically if you have the same error as me, check the thumbprint list of the identity provider.
Hope this helps.
seeing this occasionally in some task too