aws-otel-collector icon indicating copy to clipboard operation
aws-otel-collector copied to clipboard

aws.ecs.service.name undefined

Open chenfeilee opened this issue 3 years ago • 14 comments

Describe the question We have deployed aws-otel-collector as sidecar container in ECS and have configured awsecscontainermetrics. We are sending the metrics collected to AMP via awsprometheusremotewrite.

We can see all the resource attributes and metrics labels showing up just fine except aws.ecs.service.name which is showing up as "undefined". Just wondering if this is the expected behavior?

Steps to reproduce if your question is related to an action

  1. Deploy aws-otel-collector as sidecar container in ECS
  2. Define receiver as awsecscontainermetrics and exporter as awsprometheusremotewrite

What did you expect to see? aws.ecs.service.name to display the correct value

Environment ECS

Additional context Config used for aws-otel-collector:

receivers:
  awsecscontainermetrics:
    collection_interval: 15s
processors:
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names:
        - .*memory.reserved
        - .*memory.utilized
        - .*cpu.reserved
        - .*cpu.utilized
        - .*network.rate.rx
        - .*network.rate.tx
        - .*storage.read_bytes
        - .*storage.write_bytes
  resource:
    attributes:
    - key: aws.ecs.task.id
      action: delete
    - key: aws.ecs.task.pull_started_at
      action: delete 
    - key: aws.ecs.task.pull_stopped_at
      action: delete 
    - key: aws.ecs.task.arn
      action: delete 
    - key: aws.ecs.container.image.id
      action: delete 
    - key: aws.ecs.container.created_at
      action: delete 
    - key: aws.ecs.container.finished_at
      action: delete 
    - key: container.id
      action: delete 
    - key: aws.ecs.container.exit_code
      action: delete 
    - key: opencensus.resourcetype
      action: delete 
exporters:
  awsprometheusremotewrite:
    endpoint: ${PROM_REMOTE_WRITE_ENDPOINT}
    resource_to_telemetry_conversion:
      enabled: true
    aws_auth:
      region: ${PROM_REMOTE_WRITE_ENDPOINT_REGION}
      role_arn: ${PROM_REMOTE_WRITE_IAM_ROLE}
  logging:
    loglevel: warn
service:
  pipelines:
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter, resource]
      exporters: [logging, awsprometheusremotewrite]

chenfeilee avatar Jan 31 '22 09:01 chenfeilee

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/awsecscontainermetricsreceiver/internal/awsecscontainermetrics/resource.go#L66 not sure why it is set like this I will try to hunt down who wrote this code

sethAmazon avatar Feb 01 '22 18:02 sethAmazon

This requires ECS to provide the information through ECS metadata endpoint so that ADOT can get the information from ECS metadata endoint and publish the info down the stream. The timeline of this info from metadata endpoint is unknown.

lubingfeng avatar Feb 01 '22 19:02 lubingfeng

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Apr 03 '22 20:04 github-actions[bot]

The service name can be retrieved using the AWS SDK that is already a part of this library: the ECS describe-tasks endpoint response includes a field group which holds service:<ServiceName>. Could this be utilised to make this information available?

lxop avatar Apr 12 '22 21:04 lxop

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Jun 12 '22 20:06 github-actions[bot]

This issue was closed because it has been marked as stale for 30 days with no activity.

github-actions[bot] avatar Jul 17 '22 20:07 github-actions[bot]

RIP to another interesting issue closed while waiting for an answer by AWS.

Kralizek avatar Jul 18 '22 08:07 Kralizek

Hey @Kralizek ADOT PM here. I will interpret your feedback as an sarcastic attempt to vent your frustration about the pace we work. That's fair, not helpful, but fair.

Just because an issue is stale and hence auto-closed doesn't mean it's closed for good (in fact, I just re-opened it). Now, we will get to it in the fullness of time, I can ATM no yet advise on a timeline.

Thanks for your patience and if you have anything further to share that can contribute towards a resolution, please feel free to share.

mhausenblas avatar Jul 18 '22 08:07 mhausenblas

@mhausenblas by no mean I want to disrespect the pace of work of any AWS team.

I'm just extremely critical of the set up you have of the automation to close stale issues as you can see by the conversation I tried to start here.

Take an example at this specific case:

  • an issue was reported
  • AWS employee(s) have looked into it and said that they will look into the problem

now we are at a point where:

  • customers like me won't post more to this issue because they think the issue will (eventually) be taken care of
  • AWS employees won't add more because the poster was given an answer

Days go on and eventually the automation kicks in and closes the issue because it was marked as stale. The fact that the automation kicks in during the weekend when people is not checking their notifications on GH and closes the issue within a day doesn't help and only adds up to the frustration.

I personally am really interested in this issue. What should I do? Should I post a "+1" every 3 months to make sure the issue doesn't get stale? That seems just a dirty trick around a process that can and should be improved.

And sorry again if you felt that my post was a perceived as a rant at your work pace. It's really not the case.

Kralizek avatar Jul 18 '22 09:07 Kralizek

@Kralizek thanks for your feedback and that I find super useful and actionable. Two quick thoughts and I will put this issue on our backlog and make sure we come up with a solid strategy ASAP:

  1. We make it very clear that this is an open source project for which we, via this site (GitHub) provide support on an best effort basis.
  2. If you desire support with SLA etc. then the option is to use ADOT with one of the supported compute (EC2, ECS, EKS, Lamdba, etc.) and/or destinations and you can (and should!) create a support ticket. This requires that your organization has Enterprise Support. Also, in this case, there should be no GitHub issue here in the first place since it's handled by our internal trouble ticket system.

HTH and I will have an update for you in the context of ADOT by end of week.

mhausenblas avatar Jul 18 '22 09:07 mhausenblas

@mhausenblas I appreciate a lot your explanation. I just want to emphasize that I understand that this is a OSS project and support is given on best effort. And I'm fine with waiting for/if it's its time. I just wish legitimate feature requests (like this one) or bug reports (like I've seen in other repositories) weren't artificially dismissed for the reasons I explained earlier.

Kralizek avatar Jul 18 '22 09:07 Kralizek

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Sep 18 '22 20:09 github-actions[bot]

.

Kralizek avatar Sep 18 '22 22:09 Kralizek

FYI.

ServiceName can be retrieved from metadata when using Amazon ECS container agent version 1.63.1 or later.

https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-ecs-metadata-attributes-tasks-running-amazon-ec2/ https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html#task-metadata-endpoint-v4-response

0nihajim avatar Oct 08 '22 03:10 0nihajim

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Dec 11 '22 20:12 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Mar 12 '23 20:03 github-actions[bot]

bump

Kralizek avatar Mar 13 '23 09:03 Kralizek

FYI.

ServiceName can be retrieved from metadata when using Amazon ECS container agent version 1.63.1 or later.

https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-ecs-metadata-attributes-tasks-running-amazon-ec2/ https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html#task-metadata-endpoint-v4-response

As stated here, ServiceName is now available from the task metadata endpoint as of ECS container agent version 1.63.1 (also announced here, with docs here). The code for the awsecscontainermetricsreceiver will still need to be updated, I've created an issue to track those changes upstream, see here. Will continue to update this issue as it gets worked on upstream.

erichsueh3 avatar Mar 15 '23 00:03 erichsueh3

This PR upstream resolves the issue, and changes should be included in the next ADOT Collector release.

erichsueh3 avatar Mar 24 '23 22:03 erichsueh3