amazon-ecs-exec-checker
amazon-ecs-exec-checker copied to clipboard
ECS execute-command failed due to an internal error.
Hi there, I'm trying to run execute-command
to open an interactive shell against my ECS Fargate task. I'm using this checker to validate my configuration:
$ bash <( curl -Ls https://raw.githubusercontent.com/aws-containers/amazon-ecs-exec-checker/main/check-ecs-exec.sh ) clusterName cf41c924968e426c9be535f3f47545be
-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
jq | OK (/usr/bin/jq)
AWS CLI | OK (/usr/local/bin/aws)
-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
AWS CLI Version | OK (aws-cli/2.4.0 Python/3.8.8 Linux/5.11.0-40-generic exe/x86_64.ubuntu.20 prompt/off)
Session Manager Plugin | OK (1.2.205.0)
-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-south-1
Cluster: clusterName
Task : cf41c924968e426c9be535f3f47545be
-------------------------------------------------------------
Cluster Configuration | Audit Logging Not Configured
Can I ExecuteCommand? | arn:aws:iam::ACCOUNT_ID:role/ADMIN_ROLE_NAME
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Launch Type | Fargate
Platform Version | 1.4.0
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. RUNNING for "taskName"
----------
Init Process Enabled (taskName:1)
----------
1. Enabled - "taskName"
----------
Read-Only Root Filesystem (taskName:1)
----------
1. Disabled - "taskName"
Task Role Permissions | arn:aws:iam::ACCOUNT_ID:role/taskName-ecs-task
ssmmessages:CreateControlChannel: allowed
ssmmessages:CreateDataChannel: allowed
ssmmessages:OpenControlChannel: allowed
ssmmessages:OpenDataChannel: allowed
VPC Endpoints |
Found existing endpoints for vpc-ID:
- com.amazonaws.eu-south-1.ssm
- com.amazonaws.eu-south-1.ec2messages
- com.amazonaws.eu-south-1.ssmmessages
However, I'm getting TargetNotConnectedException
. I've also opened an issue here.
Am I missing something..?
Hi, @nic-russo! Thank you for reaching out to us here.
In general TargetNotConnectedException
indicates that the required connection between the managed agent running in your task container and SSM Session Manager.
Supposing there is no bug in the exec checker script itself, could you possible check/try the following?
- VPC endpoint policies, if the VPC endpoints above have any VPC endpoint policy
- Wait a few minutes and try again (because managed agents regularly try to reconnect)
- Stop the task and try a new task (if the task is under an ECS service)
- Update the session manager plugin to the latest (
1.2.279.0
os the latest as of Dec. 1st 2021) - See and check CloudTrail logs if there is any error API calls related to ECS and SSM Session Manager
Also it would be helpful to debug this issue since there is known limitations in the exec checker script:
- The script doesn't support specific IAM roles/policies with (1)
Conditions
or (2) IAM permission boundaries. In this case you need to check manually that ~(a) your IAM role ("role/ADMIN_ROLE_NAME" in the script result) is NOT limited to callExecuteCommand
API, and (b)~ the task role ("role/taskName-ecs-task") is NOT limited to call SSM Session Manager APIs.
(Comment updated since it looks your IAM user was at least already able to call ExecuteCommand API)
Hi @toricls, thanks for your support! I went through your list:
- The
com.amazonaws.eu-south-1.ssmmessages
VPC endpoint has the following policy
{
"Statement": [
{
"Action": "*",
"Effect": "Allow",
"Resource": "*",
"Principal": "*"
}
]
}
- (3. 4.) Done, no luck (Yes, I'm running the task into an ECS service - Fargate).
5 . I see 2 events in cloudtrail:
Event Name: ExecuteCommand
{
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "ID_HERE:botocore-session-1638379635",
"arn": "arn:aws:sts::ACCOUNT_ID:assumed-role/ADMIN_ROLE/botocore-session-1638379635",
"accountId": "ACCOUNT_ID",
"accessKeyId": "ACCOUNT_KEY",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "ID_HERE",
"arn": "arn:aws:iam::ACCOUNT_ID:role/ADMIN_ROLE",
"accountId": "ACCOUNT_ID",
"userName": "ADMIN_ROLE"
},
"webIdFederationData": {},
"attributes": {
"creationDate": "2021-12-01T17:27:16Z",
"mfaAuthenticated": "false"
}
}
},
"eventTime": "2021-12-01T17:41:56Z",
"eventSource": "ecs.amazonaws.com",
"eventName": "ExecuteCommand",
"awsRegion": "eu-south-1",
"sourceIPAddress": "IP_HERE",
"userAgent": "aws-cli/2.4.0 Python/3.8.8 Linux/5.11.0-40-generic exe/x86_64.ubuntu.20 prompt/off command/ecs.execute-command",
"errorCode": "ClientException",
"errorMessage": "The execute command failed due to an internal error. Try again later.",
"requestParameters": {
"cluster": "clusterName",
"container": "containerName",
"command": "/bin/bash",
"interactive": true,
"task": "bc72af51a5d942519202ed10342ef307"
},
"responseElements": null,
"requestID": "340b94d2-ac03-4c4f-a386-e8728b95adc0",
"eventID": "c3503af0-3fa1-41e7-a447-e5ee70055641",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "ACCOUNT_ID",
"eventCategory": "Management"
}
Event Name: StartSession
{
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "ID_HERE:ecs-execute-command",
"arn": "arn:aws:sts::ACCOUNT_ID:assumed-role/AWSServiceRoleForECS/ecs-execute-command",
"accountId": "ACCOUNT_ID",
"accessKeyId": "ACCOUNT_KEY",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "ID_HERE",
"arn": "arn:aws:iam::ACCOUNT_ID:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
"accountId": "ACCOUNT_ID",
"userName": "AWSServiceRoleForECS"
},
"webIdFederationData": {},
"attributes": {
"creationDate": "2021-12-01T17:41:56Z",
"mfaAuthenticated": "false"
}
},
"invokedBy": "ecs.amazonaws.com"
},
"eventTime": "2021-12-01T17:41:56Z",
"eventSource": "ssm.amazonaws.com",
"eventName": "StartSession",
"awsRegion": "eu-south-1",
"sourceIPAddress": "ecs.amazonaws.com",
"userAgent": "ecs.amazonaws.com",
"errorCode": "TargetNotConnected",
"errorMessage": "ecs:aws-monitor_bc72af51a5d942519202ed10342ef307_bc72af51a5d942519202ed10342ef307-1708420469 is not connected.",
"requestParameters": {
"target": "ecs:aws-monitor_bc72af51a5d942519202ed10342ef307_bc72af51a5d942519202ed10342ef307-1708420469",
"documentName": "AmazonECS-ExecuteInteractiveCommand",
"parameters": {
"cloudWatchLogGroupName": [
"ECS_aws-monitor"
],
"command": [
"/bin/bash"
]
}
},
"responseElements": null,
"requestID": "57d9cf3e-25d6-470a-9f0c-cad8a12d3778",
"eventID": "a2da3ef3-2f00-49ee-940f-eccff1fb13c1",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "ACCOUNT_ID",
"eventCategory": "Management"
}
6 . The task role has the AmazonSSMManagedInstanceCore
policy and some non-related ones
any movement on this issue?
Following, having the same issue. A little additional info:
We manage our infra with terraform and have ~8 identical deployments with literally no variation besides names and we're seeing this issue intermittently on random deployments.
I'm also apparently experiencing this error intermittently. I am managing my infra via terraform as well. Despite rolling back to previous topolagies, I'm still seeing this on tasks from some services and not others.
I am facing the same issue. Tasks created prior to March 29 had no problems. (This may be a coincidence, but it seems to coincide with the release date of v3.1.1188.0 of SSM Agent.)
Same issue here as of yesterday
This looks related to https://github.com/aws-containers/amazon-ecs-exec-checker/issues/49.
Please check if you have the environment variables AWS_ACCESS_KEY
/ AWS_SECRET_ACCESS_KEY
set and if unsetting those solves this issue.
I had the same issue as well and as mentioned by @tim-finnigan, changing some ENV vars called AWS_ACCESS_KEY / AWS_SECRET_ACCESS_KEY to different variable names ended up solving the issue for us a well.
I was experiencing this same issue but found a fix.
Initially made sure:
- All checks passed amazon-ecs-exec-checker
- Our ECS tasks did not have ENV vars set for AWS_ACCESS_KEY or AWS_SECRET_ACCESS_KEY
- The ECS agent was updated on the instance to latest (Agent version 1.61.3, Docker version 20.10.13).
Solution: After double checking all settings (roles, permissions, etc) I tried updating the AMI we were using on the instance and that fixed the issue and was successfully able to execute-command on the task!
The AMI that did not allow for execute command:
- AMI ID: ami-0a5e7c9183d1cea27
- AMI name: amzn2-ami-ecs-hvm-2.0.20220209-x86_64-ebs
Updated to this AMI which does allow for execute command:
- AMI ID: ami-040d909ea4e56f8f3
- AMI name: amzn2-ami-ecs-hvm-2.0.20220630-x86_64-ebs
(Which is currently latest via https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)