aws-otel-collector
aws-otel-collector copied to clipboard
aws.ecs.service.name undefined
Describe the question
We have deployed aws-otel-collector as sidecar container in ECS and have configured awsecscontainermetrics
. We are sending the metrics collected to AMP via awsprometheusremotewrite
.
We can see all the resource attributes and metrics labels showing up just fine except aws.ecs.service.name
which is showing up as "undefined". Just wondering if this is the expected behavior?
Steps to reproduce if your question is related to an action
- Deploy aws-otel-collector as sidecar container in ECS
- Define receiver as
awsecscontainermetrics
and exporter asawsprometheusremotewrite
What did you expect to see?
aws.ecs.service.name
to display the correct value
Environment ECS
Additional context Config used for aws-otel-collector:
receivers:
awsecscontainermetrics:
collection_interval: 15s
processors:
filter:
metrics:
include:
match_type: regexp
metric_names:
- .*memory.reserved
- .*memory.utilized
- .*cpu.reserved
- .*cpu.utilized
- .*network.rate.rx
- .*network.rate.tx
- .*storage.read_bytes
- .*storage.write_bytes
resource:
attributes:
- key: aws.ecs.task.id
action: delete
- key: aws.ecs.task.pull_started_at
action: delete
- key: aws.ecs.task.pull_stopped_at
action: delete
- key: aws.ecs.task.arn
action: delete
- key: aws.ecs.container.image.id
action: delete
- key: aws.ecs.container.created_at
action: delete
- key: aws.ecs.container.finished_at
action: delete
- key: container.id
action: delete
- key: aws.ecs.container.exit_code
action: delete
- key: opencensus.resourcetype
action: delete
exporters:
awsprometheusremotewrite:
endpoint: ${PROM_REMOTE_WRITE_ENDPOINT}
resource_to_telemetry_conversion:
enabled: true
aws_auth:
region: ${PROM_REMOTE_WRITE_ENDPOINT_REGION}
role_arn: ${PROM_REMOTE_WRITE_IAM_ROLE}
logging:
loglevel: warn
service:
pipelines:
metrics/ecs:
receivers: [awsecscontainermetrics]
processors: [filter, resource]
exporters: [logging, awsprometheusremotewrite]
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/awsecscontainermetricsreceiver/internal/awsecscontainermetrics/resource.go#L66 not sure why it is set like this I will try to hunt down who wrote this code
This requires ECS to provide the information through ECS metadata endpoint so that ADOT can get the information from ECS metadata endoint and publish the info down the stream. The timeline of this info from metadata endpoint is unknown.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
The service name can be retrieved using the AWS SDK that is already a part of this library: the ECS describe-tasks
endpoint response includes a field group
which holds service:<ServiceName>
. Could this be utilised to make this information available?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been marked as stale for 30 days with no activity.
RIP to another interesting issue closed while waiting for an answer by AWS.
Hey @Kralizek ADOT PM here. I will interpret your feedback as an sarcastic attempt to vent your frustration about the pace we work. That's fair, not helpful, but fair.
Just because an issue is stale and hence auto-closed doesn't mean it's closed for good (in fact, I just re-opened it). Now, we will get to it in the fullness of time, I can ATM no yet advise on a timeline.
Thanks for your patience and if you have anything further to share that can contribute towards a resolution, please feel free to share.
@mhausenblas by no mean I want to disrespect the pace of work of any AWS team.
I'm just extremely critical of the set up you have of the automation to close stale issues as you can see by the conversation I tried to start here.
Take an example at this specific case:
- an issue was reported
- AWS employee(s) have looked into it and said that they will look into the problem
now we are at a point where:
- customers like me won't post more to this issue because they think the issue will (eventually) be taken care of
- AWS employees won't add more because the poster was given an answer
Days go on and eventually the automation kicks in and closes the issue because it was marked as stale. The fact that the automation kicks in during the weekend when people is not checking their notifications on GH and closes the issue within a day doesn't help and only adds up to the frustration.
I personally am really interested in this issue. What should I do? Should I post a "+1" every 3 months to make sure the issue doesn't get stale? That seems just a dirty trick around a process that can and should be improved.
And sorry again if you felt that my post was a perceived as a rant at your work pace. It's really not the case.
@Kralizek thanks for your feedback and that I find super useful and actionable. Two quick thoughts and I will put this issue on our backlog and make sure we come up with a solid strategy ASAP:
- We make it very clear that this is an open source project for which we, via this site (GitHub) provide support on an best effort basis.
- If you desire support with SLA etc. then the option is to use ADOT with one of the supported compute (EC2, ECS, EKS, Lamdba, etc.) and/or destinations and you can (and should!) create a support ticket. This requires that your organization has Enterprise Support. Also, in this case, there should be no GitHub issue here in the first place since it's handled by our internal trouble ticket system.
HTH and I will have an update for you in the context of ADOT by end of week.
@mhausenblas I appreciate a lot your explanation. I just want to emphasize that I understand that this is a OSS project and support is given on best effort. And I'm fine with waiting for/if it's its time. I just wish legitimate feature requests (like this one) or bug reports (like I've seen in other repositories) weren't artificially dismissed for the reasons I explained earlier.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
.
FYI.
ServiceName can be retrieved from metadata when using Amazon ECS container agent version 1.63.1 or later.
https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-ecs-metadata-attributes-tasks-running-amazon-ec2/ https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html#task-metadata-endpoint-v4-response
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
bump
FYI.
ServiceName can be retrieved from metadata when using Amazon ECS container agent version 1.63.1 or later.
https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-ecs-metadata-attributes-tasks-running-amazon-ec2/ https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html#task-metadata-endpoint-v4-response
As stated here, ServiceName
is now available from the task metadata endpoint as of ECS container agent version 1.63.1 (also announced here, with docs here). The code for the awsecscontainermetricsreceiver
will still need to be updated, I've created an issue to track those changes upstream, see here. Will continue to update this issue as it gets worked on upstream.
This PR upstream resolves the issue, and changes should be included in the next ADOT Collector release.