containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ECS-Fargate] [BUG]: Container depends on is not respected in 1.4.0

Open SecretPenguin1 opened this issue 4 years ago • 7 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request If I deploy a task in 1.3.0 that has a container order specified, the order is respected. In 1.4.0 all containers in the task immediately start.

Which service(s) is this request for? Fargate

SecretPenguin1 avatar Apr 23 '20 11:04 SecretPenguin1

Hello,

Thanks for reaching out to us regarding this issue. Please note that the health status updates that you see in the describe-tasks API response are asynchronous and do not always accurately reflect when the health status for the container actually changed. Because of the underlying difference between platform versions 1.3 and 1.4, you might see a different cadence at which this status gets updated. In either of the platform versions, we do not recommend relying on the describe-task API response to determine when the container health status actually changed.

You can however validate that the behavior wrt the health status and container dependency ordering works as expected by looking at either your application logs or by looking at container start and stop timestamps from ECS task metadata service from one of your containers within the task. If you still think that it's not working as expected, we'd be happy to take another look at this.

Thanks, Anirudh

aaithal avatar Jun 22 '20 18:06 aaithal

@aaithal we have encountered this same issue. It would be nice if this could be looked into

ConradKurth avatar Dec 31 '21 22:12 ConradKurth

(Found this doing internet searches trying to figure out an issue I ran into but ended up being unrelated.)

I see the DependsOn working now.

thedannywilcox avatar Apr 11 '22 18:04 thedannywilcox

awesome! Thanks for commenting @thedannywilcox

ConradKurth avatar May 16 '22 17:05 ConradKurth

@thedannywilcox I just got around again to messing with our ECS services, and I do not see the depends on working, are you sure a fix was rolled out for this? and you are running on 1.14?

ConradKurth avatar Jul 08 '22 14:07 ConradKurth

@ConradKurth do you mean fargate 1.4.0? If so, yup

thedannywilcox avatar Jul 08 '22 14:07 thedannywilcox

@thedannywilcox thanks for getting back to so quickly! Maybe I am doing something wrong, let me do some more digging. for context for anyone that comes across this. I am running a sidecar which pipes logs to a third party. The side car log sink needs to be up before the service container to send them.

ConradKurth avatar Jul 08 '22 17:07 ConradKurth

@ConradKurth Did you get this working? I have the exact same issue. We run an application container and in the same task have a datadog agent and fluentbit log router container. Both those need to be up before the application container and down after the app so logs and metrics make it to Datadog. I have a dependsOn directive which has START set as the state for the fluentbit and datadog containers.

            "dependsOn": [
                {
                    "containerName": "datadog-agent",
                    "condition": "START"
                },
                {
                    "containerName": "fluentbit_log_router",
                    "condition": "START"
                }
            ],

Logs get to Datadog intermittently and the only way I'm getting our logging to work reliably is to put a sleep in the code of the application container.

anitakrueger avatar Nov 08 '22 19:11 anitakrueger

What if you add a health check and specify condition=HEALTHY? That was how I got it to work because I wanted the agent actually working before it accepts traffic, not just ECS saying it was started.

thedannywilcox avatar Nov 08 '22 19:11 thedannywilcox

Yes, after posting this, that is what I implemented now. The START condition made them start 2 seconds apart, but fluentbit and datadog weren't ready in 2 seconds. It all works as expected with a condition of HEALTHY and a healthcheck for both. I think this github issue should be closed.

The full solution looks like this:

depends on directive (in terraform) for the application container:

  container_depends_on = [
    {
      containerName = "datadog-agent"
      condition     = "HEALTHY"
    },
    {
      containerName = "fluentbit_log_router"
      condition     = "HEALTHY"
    }
  ]

healthcheck for datadog agent (in terraform):

  healthcheck = {
    command     = ["CMD-SHELL", "agent health"]
    retries     = 5
    timeout     = 10
    interval    = 5
    startPeriod = 5
  }

healthcheck for fluentbit (in terraform):

  healthcheck = {
    command     = ["CMD-SHELL", "echo '{\"health\": \"check\"}' | nc 127.0.0.1 8877 || exit 1"]
    retries     = 5
    timeout     = 10
    interval    = 5
    startPeriod = 5
  }

anitakrueger avatar Nov 09 '22 12:11 anitakrueger

Thanks for help in getting this sorted out anitakrueger@. Closing this issue per your recommendation and because Fargate PV 1.4 is respecting the container depends on field.

alexcmms avatar Sep 27 '23 23:09 alexcmms