aws-otel-collector
aws-otel-collector copied to clipboard
Fargate/ECS healthcheck
Describe the question Hi all, I have an issue getting the healthcheck to function with Fargate. I followed the instructions and installed the sidecar but cannot get the sidecar healthcheck to be healthy. This means that my service keeps getting killed because ECS thinks the aws-otel-collector sidecar is unhealthy.
Steps to reproduce if your question is related to an action Service is provisioned with CDK. The sidecar health check is specified as follows:
healthCheck: {
command: ["CMD-SHELL", "curl -f http://127.0.0.1:13133/ || exit 1"],
timeout: Duration.seconds(10),
startPeriod: Duration.seconds(10),
},
What did you expect to see? The sidecar would be found to be healthy
Additional context Looking at the Dockerfile here it looks like aws-otel-collector is build from scratch and so will not have curl, or even a shell for that matter. How are health checks expected to be configured?
Thanks
Could you please provide your Collector Config that you used when setting up the ADOT Collector?
Hi,
Thanks for getting back to me. I just used the standard insights config. E.g.
taskDefinition.addContainer("otelContainer", {
image: ContainerImage.fromRegistry("public.ecr.aws/aws-observability/aws-otel-collector:latest"),
command: ["--config=/etc/ecs/container-insights/otel-task-metrics-config.yaml"],
essential: false,
portMappings: [...],
healthCheck: {
command: ["CMD-SHELL", "curl -f http://127.0.0.1:13133/ || exit 1"],
timeout: Duration.seconds(10),
startPeriod: Duration.seconds(10),
}
}
Currently I don't have any Collector CDK documentation to point you toward so this may require some experimenting.
I can setup a similar environment and see what I can discover on my side. Is there any other CDK environment information that could be useful for when I build out my own CDK deployment?
What version of CDK are you using?
The latest v2.17
I really don't think CDK has anything to do with it though. Fundamentally I am unsure how you are supposed to run the healthcheck when on the Fargate/ECS sidecar. Given the healthcheck is run on the sidecar and the otel image doesn't have a shell or curl etc how can ECS consider it healthy?
The only option I believe I have for the healthcheck definition is to use the shell
e.g. command: ["CMD-SHELL", ...
Here's a fairly minimal example which should illustrate it, https://github.com/pauldoherty-optifly/fargateOtelExample
I could obviously take container aws-otel-collector image and add to it then publish it myself but the documentation makes no reference to having to do that
Hi @pauldoherty-optifly,
I am going to bring this to the team and see if we can provide an official recommendation. I will reach back out here when I have more information.
Thanks 👍
Hi @pauldoherty-optifly ,
We do see the issue here. We are working on a solution currently and have added it to the backlog milestone. I will leave this issue open and ensure that is mentioned when a PR is created with a fix.
we have now added the healthcheck component with the new ADOT collector release v0.23.0.
Closing Issue as PR for this issue is merged and is part of collector v0.23.0