tempo
tempo copied to clipboard
Grafana-agent incorrectly tag traces received from Kubernetes pod
Describe the bug We have deployed grafana-agent to AWS EKS for collection of traces. We were seeing different tag values (via the kubernetes service discovery) for different spans of the same trace.
Seem like it was because the running pod (which the traces were being sent from) has the same private ip address as another completed pod.
To Reproduce Steps to reproduce the behavior:
- Deploy grafana-agent on AWS EKS
- Start sending traces to grafana-agent
- Look for trace in Grafana
Expected behavior Spans being tagged correctly (in this case would be to only include metadata from the running pod at the time when the traces were received)
Environment:
- Infrastructure: AWS EKS
Additional Context




@joe-elliott any thoughts on this? thanks.
Hi @chenfeilee! The Grafana Agent uses Prometheus Service Discovery to get metadata from Kubernetes. It should be possible to drop Succeeded
pods using PromSD config.
When using role: pod
, there is a metadata label __meta_kubernetes_pod_phase
, that contains the pod's status. You could use that label to add a relabel config and drop targets that have status Suceeded
. This way, completed pods will be dropped and traces will be tagged only with Running
pods' metadata.
Example config:
traces:
configs:
- name: <name>
...
scrape_configs:
- job_name: <job-name>
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: drop
source_labels:
- __meta_kubernetes_pod_phase
regex: Succeeded
....
@mapno I have given it a try and it works! thank you so much!
One question though: do I have to drop the Failed
pods too to avoid the same issue? or the same issue only occurs to Succeeded
pods?
That's great to hear!
One question though: do I have to drop the Failed pods too to avoid the same issue? or the same issue only occurs to Succeeded pods?
I think so, yes. Another alternative is using the action: keep
to keep only targets that match Running
. That should be
foolproof, since pods can also be in Pending
and Unknown
state. I don't think you'll want to keep any of those either.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.