AWS ECS agent does not start in EC2 instance
Summary AWS ECS agent does not start in EC2 instance.
Reopened: https://github.com/aws/amazon-ecs-agent/issues/4130 Check all comments for detailed information.
Hello Thiago
i found a workaround by using the user_data script only, so no need to reboot the instances, which is impossible if you are using an asg like me (it will terminate them).
Big thanks to garysferrao for the idea, i connected to the ec2 and tried to restart cloud-final.service with this command:
sudo systemctl restart cloud-final.service
and it worked well, ECS detected the instances, so the solution for me was to do that in the user_data without creating an infinite loop ... (restarting cloud-final.service will execute user_data script again so the loop starts)
The solution: Create a systemd service in the user_data that starts after cloud-final.target The user_data runs only once → it creates the systemd service and activates it, then, systemd manages the startup of the ECS service, running only once at boot time
the user_data script:
#!/bin/bash
# Configure ECS
echo "ECS_CLUSTER=${ECS_CLUSTER}" > /etc/ecs/ecs.config
# Create the systemd service to start the ECS agent after cloud-final
cat > /etc/systemd/system/ecs-start.service <<EOF
[Unit]
Description=Start ECS Agent after cloud-init
After=cloud-final.target
Requires=cloud-final.target
[Service]
Type=oneshot
ExecStart=/bin/bash -c "systemctl enable ecs && systemctl start ecs"
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
# Reload systemd to take the new service into account
systemctl daemon-reload
systemctl enable ecs-start.service
It worked for me so hope it will work for you too 😄
Thanks for sharing your workaround @akayesb . I'll keep the issue open so a proper solution can be implemented.
Hello! Thanks for opening this issue. Just to summarize what's going on, we're seeing that ECS agent can't start up because it's waiting for cloud-init to finish. On ECS side, this is the intended behavior where we want to start up after both the docker and cloud-final services have finished their boot-up.
Just curious, is there anything special that's defined in your EC2 userdata or can the issue be reproducible even with an empty userdata on an ECS Optimized AMI? Also, is there a specific ECS AMI version where this is happening more often?
Feel free to also collect agent logs using our log collector script and forward it over to [email protected] for us to take a look.