containers-roadmap
containers-roadmap copied to clipboard
[ECS-Fargate] [BUG]: Container depends on is not respected in 1.4.0
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request If I deploy a task in 1.3.0 that has a container order specified, the order is respected. In 1.4.0 all containers in the task immediately start.
Which service(s) is this request for? Fargate
Hello,
Thanks for reaching out to us regarding this issue. Please note that the health status updates that you see in the describe-tasks
API response are asynchronous and do not always accurately reflect when the health status for the container actually changed. Because of the underlying difference between platform versions 1.3 and 1.4, you might see a different cadence at which this status gets updated. In either of the platform versions, we do not recommend relying on the describe-task
API response to determine when the container health status actually changed.
You can however validate that the behavior wrt the health status and container dependency ordering works as expected by looking at either your application logs or by looking at container start and stop timestamps from ECS task metadata service from one of your containers within the task. If you still think that it's not working as expected, we'd be happy to take another look at this.
Thanks, Anirudh
@aaithal we have encountered this same issue. It would be nice if this could be looked into
(Found this doing internet searches trying to figure out an issue I ran into but ended up being unrelated.)
I see the DependsOn working now.
awesome! Thanks for commenting @thedannywilcox
@thedannywilcox I just got around again to messing with our ECS services, and I do not see the depends on working, are you sure a fix was rolled out for this? and you are running on 1.14?
@ConradKurth do you mean fargate 1.4.0? If so, yup
@thedannywilcox thanks for getting back to so quickly! Maybe I am doing something wrong, let me do some more digging. for context for anyone that comes across this. I am running a sidecar which pipes logs to a third party. The side car log sink needs to be up before the service container to send them.
@ConradKurth Did you get this working? I have the exact same issue. We run an application container and in the same task have a datadog agent and fluentbit log router container. Both those need to be up before the application container and down after the app so logs and metrics make it to Datadog. I have a dependsOn directive which has START
set as the state for the fluentbit and datadog containers.
"dependsOn": [
{
"containerName": "datadog-agent",
"condition": "START"
},
{
"containerName": "fluentbit_log_router",
"condition": "START"
}
],
Logs get to Datadog intermittently and the only way I'm getting our logging to work reliably is to put a sleep in the code of the application container.
What if you add a health check and specify condition=HEALTHY? That was how I got it to work because I wanted the agent actually working before it accepts traffic, not just ECS saying it was started.
Yes, after posting this, that is what I implemented now. The START condition made them start 2 seconds apart, but fluentbit and datadog weren't ready in 2 seconds. It all works as expected with a condition of HEALTHY and a healthcheck for both. I think this github issue should be closed.
The full solution looks like this:
depends on directive (in terraform) for the application container:
container_depends_on = [
{
containerName = "datadog-agent"
condition = "HEALTHY"
},
{
containerName = "fluentbit_log_router"
condition = "HEALTHY"
}
]
healthcheck for datadog agent (in terraform):
healthcheck = {
command = ["CMD-SHELL", "agent health"]
retries = 5
timeout = 10
interval = 5
startPeriod = 5
}
healthcheck for fluentbit (in terraform):
healthcheck = {
command = ["CMD-SHELL", "echo '{\"health\": \"check\"}' | nc 127.0.0.1 8877 || exit 1"]
retries = 5
timeout = 10
interval = 5
startPeriod = 5
}
Thanks for help in getting this sorted out anitakrueger@. Closing this issue per your recommendation and because Fargate PV 1.4 is respecting the container depends on field.