amazon-ecs-agent icon indicating copy to clipboard operation
amazon-ecs-agent copied to clipboard

AWS ECS agent does not start in EC2 instance

Open thiagoscodelerae opened this issue 1 year ago • 13 comments

Summary

AWS ECS agent does not start in EC2 instance

Description

It looks like there might be an issue with the ECS agent on my ECS cluster. For the past two weeks, my ECS cluster with EC2 instances managed by auto scaling (launch templates) and capacity provider has been working fine. However, new instances are not being connected to the ECS cluster because the agent is not starting anymore.

Even when I try to start the ECS agent manually on the instance, it hangs.

The Docker service is running properly, and the proper ECS role is attached to the instance. There are no logs for the agent on the instance.

The AMI I'm using is "amzn2-ami-ecs-hvm-2.0.20240319-x86_64-ebs" with the ID "ami-06ebbcdf40f9949e7." Already tried some new AMI versions but facing the same issue.

Here's the ECS service status on a freshly launched EC2 instance:

ecs.service - ECS Agent
   Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Expected Behavior

ECS agent service starting automatically.

Environment Details

  • docker info:
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.0.0+unknown)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 20.10.25
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 64b8a811b07ba6288238eefc14d898ee0b5b99ba
 runc version: 4bccb38cc9cf198d52bebf2b3a90cd14e7af8c06
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.14.336-257.566.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 14.91GiB
 Name: ip-10-0-4-66.us-west-2.compute.internal
 ID: Q2XT:HGAQ:XXEK:T7OZ:7C2Y:WJYW:44JQ:OWYE:VUG5:FSOB:QBAV:MPPK
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • curl http://localhost:51678/v1/metadata
"Version":"Amazon ECS Agent - v1.82.1

Supporting Log Snippets

  • journalctl -xeu ecs.service
---no entries---

thiagoscodelerae avatar Apr 01 '24 12:04 thiagoscodelerae

Hello Thiago,

Has anything changed since you posted this last week? Unfortunately, we can't diagnose the issue without any logs. For further investigation, we suggest using amazon-ecs-logs-collector and providing us with the logs. For a temporary mitigation, we suggest you try a few older AMI versions (since the new ones do not work based on your previous attempts)

hozkaya2000 avatar Apr 10 '24 21:04 hozkaya2000

@hozkaya2000 thank you for your answer. I'll take a look at amazon-ecs-logs-collector . Regardless of the AMI version, the agent isn't working now, even though it was fine earlier. I have tried using an AMI that worked previously, but it is still not working.

thiagoscodelerae avatar Apr 11 '24 14:04 thiagoscodelerae

Closing this due to lack of activity. @thiagoscodelerae please reopen if you are still facing issues and can provide us logs from you container instance.

amogh09 avatar May 06 '24 18:05 amogh09